CMYK JPEGs extracted from PDF appear inverted


I am having to deal with CMYK JPEGs extracted from a PDF source. The PDFs were created with Photoshop.

The problem is that Photoshop stores JPEG CMYK data in PDF/EPS using "normal" values, whereas in standalone JPEGs it stores inverted values. So, when the DCTDecode streams are extracted bytewise and written to disk, the resulting JPEG files appear inverted.

(The actual extraction is done by an in-house utility, which simply extracts the bytes from the DCTDecode stream and writes them, unmodified, to a file ending in .jpg It's basically a binary copy-and-paste. The PDFs are available to re-process, should that be required.)

As the images must remain in their JFIF format, is there any way to place a marker into the extracted .jpg file to make Photoshop open it with the proper encoding? The process must be lossless (not involve further entropy encoding).

The JPEGs already contain the APP14 marker, and removing it has no effect.

Below is a quote from the libjpeg docs:

"... it appears that Adobe Photoshop writes inverted data in CMYK JPEG files: 0 represents 100% ink coverage, rather than 0% ink as you'd expect. ... Photoshop 3.0 [and newer]... write uninverted YCCK in EPS/JPEG files... (But the data polarity used in bare JPEG files will not change...)"

12/2/2012 3:14:00 AM

Here on Adobe forums is a same problem with successful results:

Maybe the APP14 tag is not correct? Theres more to APP14 tags than it just being there. On JPEG tags:

JPEG Adobe Tags

The "Adobe" APP14 segment stores image encoding information for DCT filters. This segment may be copied or deleted as a block using the Extra "Adobe" tag, but note that it is not deleted by default when deleting all metadata because it may affect the appearance of the image.

â•‘ Index2 â•‘     Tag Name     â•‘ Writable â•‘               Values / Notes               â•‘
â•‘      0 â•‘ DCTEncodeVersion â•‘ N        â•‘                                            â•‘
â•‘      1 â•‘ APP14Flags0      â•‘ N        â•‘ Bit 15 = Encoded with Blend=1 downsampling â•‘
â•‘      2 â•‘ APP14Flags1      â•‘ N        â•‘                                            â•‘
â•‘      3 â•‘ ColorTransform   â•‘ N        â•‘ 0 = Unknown (RGB or CMYK)                  â•‘
â•‘        â•‘                  â•‘          â•‘ 1 = YCbCr                                  â•‘
â•‘        â•‘                  â•‘          â•‘ 2 = YCCK                                   â•‘

But that might not help, I recall someone stating that these private markers aren't intended to guide PDF-Readers but proper decode arrays should be.

The magic seems to be

/Decode 0 1 0 1 0 1 0 1

which would invert the color mapping. (I guess that's a flag in libjpeg, something similiar should be available in any similiar tool.) Decode arrays are common in PDFs according to the PDF Reference here:

I have no clue if you can add these decode arrays into PDF JPEGs or do you need to add that to the stream processing of your in-house tool. I have no example PDF to work on, so I can't do any further research (also, the reference is huge - tl;dr - but you might have to..)

2/13/2013 6:02:00 AM