0

I'm deep in the weeds reverse engineering a very old proprietary document storage format (Keyfile). Embedded in the middle of a larger file is a block of image data (the scan of a single document page) that is encoded with CCITT4. I've learned enough about the file and the TIFF spec so far to write a filter that extracts the data from the source file and writes a new file that is supposed to be a plain TIFF, but it's not quite there yet, and I can't figure out what I'm still missing.

Encouragingly Adobe Photoshop opens my newly minted TIFF file and displays the document just fine (no errors, no warnings). Unfortunately, none of the other common tools will. I'm on a mac and have access to linux so I've tried:

  • Gimp
  • Preview (OSX)
  • ImageMagick
  • some of the libtiff utilities like fax2pdf

I suspect there's something wrong still with my TIFF file, that Photoshop is silently overlooking. I hope it's not in the raw CCITT4 image data, because I would rather not have to write code to decode that completely.

I can't post the files I'm working with because they contain sensitive data. However, I'm hoping that I'm just doing something wrong with my tiff header block that someone can point out. To that end. here's some basic information about my test file (the one that opens fine in Photoshop).

 Keyfile.tiff 31K (32300 bytes)
 Keyfile TIFF Version 1.01
   0100.0004.00000001.000009f0 ImageWidth
   0101.0004.00000001.00000ce0 ImageLength
   0102.0003.00000001.00000001 BitsPerSample
   0103.0003.00000001.00000004 Compression
   0106.0003.00000001.00000000 PhotometricInterpolation
   0111.0004.00000001.00000200 StripOffsets
   0115.0003.00000001.00000001 SamplesPerPixel
   0116.0004.00000001.00000ce0 RowsPerStrip
   0117.0004.00000001.00007c2c StripByteCounts
   011a.0005.00000001.000001d6 XResolution
   011b.0005.00000001.000001de YResolution
   0128.0004.00000001.00000002 ResolutionUnit
   0131.0002.0000001a.000001e6 Software

This decode of the TIFF header block comes from code that I've written. Here's a hex dump of the header portion of the file to address 0x200.

49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E303100

What follows is exactly 0x7c2c bytes of compressed image data. I say this based on the tiff compression tag (4), which is copied over intact form the original file, and from looking at dozens and dozens of files with a hex editor and learning to recognize the image data block. Also the fact that Photoshop opens this file would seem to indicate I am correct.

Any help figuring out what I still need to do to make this file compatible with the rest of the utilities would be much appreciated.

For what it's worth here's the error produced by imagemagick:

>convert Keyfile.tiff Keyfile.pdf

convert: Premature EOL at line 0 of strip 0 (got 0, expected 2544). `Fax4Decode' @ warning/tiff.c/TIFFWarnings/881.

I'm new to coding for TIFF and so any utilities or hints that would allow me to gather more detailed information about what's going on would also be appreciated.

Update:

Here are the first 0x318 bytes of the file. There's nothing sensitive here and you have the first 0x118 bytes of the image data. I can probably provide a bit more of the file if needed.

49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFC8085B51FFFFFFFFFFFFFFFFFFFFFFFF90154E0C4221836AC80A900F04142050814204679705E823C0D3089900E92D641B9B1D2907364E94886C112854118E6208686E6492B47D11C1A29289806DC25083A41427495102E6D349641736AA96439B08496113867960B314A08CC1A2102141410221AADC28102123E918508E02AC41143D2C5131C3C68B1620B8CCB02A8238F564536394D16F11AA050CEA8A9944105DB92591D12D04513E195B23E1252561A742191D11B0628110DA6E5259A6881891832C74B704A0C8F1B4618450E2AA4087391D17988888EA41CDAD8A2B0AAA4436A2647D94CC585

Update 2:

OK, I found a file that I can post. It's a mostly white page, but if rendered correctly, you will see the two darkish crescent moons which are the reflection of the holes on the original scanned page. There's also a bit of noise over to the right and along the top. Here's what it looks like (image):

Sample problem page

I used Photoshop to convert/save a file I could upload. Here's a hex dump of the file my code generated, which opens fine in Photoshop, but not with anything else.

49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C00001701040001000000530300001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031002C19461170350282E88E8AF52889A91024623806A1C8F97C8E8D111D1847115B44CF3A2388DA2E8C2388122F98C868E23451112508B88600D4297C8E88E44788F91E308BC4745CC8F91E23A2EC8E88F11E23B36447C8F11CC8E611020711111A6888390E39C738E0848E8BA23A388D4A224111B03681C206478DA892946E2E06D06B51121718036032092844E0AE470350604AA229C88E0680CC224511803402E24A11F88E0660D8224A40CD1016ACC8E0B606048906482C101752460C8E19006E224AC3203901D091B03C08122D9C0DA12141BFFFFFFFFFFFFFFFFFFFFFFFFFFFFF2D2125082123F1A2EA08124122EB6820A475E2105130A8209826474388886475612449543B295550C8E88224EC591D1174295B23A48C0EC591E08762111E23A2F9F46D11D02E22323E088A3870447542223EE35BDF56AD5856AD430A1856AC2879692C06C2FC304259A688BA23D2D23211A4088FC504162A5373447C20A2396062188A891F23F7C48E89502F41A46D11B417126E51328709EDE4747D04171D8B23A650E5714E13158921F111588AB0AF72CA6AB50ED27690664750C286B6B1B29D351609F21976B8685A8613C309A96014631FFFFFFFFFFFFF2039C720383A5C5DFEB56B0B51FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF9601A8FFFFFFFFFFFFFFFFFFFFFFFFFFCEC6947FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF95CEA3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE5A852A3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC004004

Here are it's specs.

Keyfile_66.tiff 1K (1363 bytes)
Keyfile TIFF Version 1.01
 0100.0004.00000001.000009f0 ImageWidth
 0101.0004.00000001.00000ce0 ImageLength
 0102.0003.00000001.00000001 BitsPerSample
 0103.0003.00000001.00000004 Compression
 0106.0003.00000001.00000000 PhotometricInterpolation
 0111.0004.00000001.00000200 StripOffsets
 0115.0003.00000001.00000001 SamplesPerPixel
 0116.0004.00000001.00000ce0 RowsPerStrip
 0117.0004.00000001.00000353 StripByteCounts
 011a.0005.00000001.000001d6 XResolution
 011b.0005.00000001.000001de YResolution
 0128.0004.00000001.00000002 ResolutionUnit
 0131.0002.0000001a.000001e6 Software

Here's a link to download the file.

Any idea why this is would be much appreciated.

Community
  • 1
  • 1
Andrew
  • 421
  • 2
  • 12
  • 1
    Can you please share your TIFF file? It looks like the image was compressed with G3-2D and you're marking it as being compressed as G4. There's a subtle difference - G3-2D has a finite K value (the number of lines compressed as 2D after a line compressed as 1D) while G4 has an infinite K value (all lines compressed as 2D). The EOL marker is only supposed to come at the very end of a G4 file, but it will appear at the end of each line encoded as 1D with the G3-2D scheme. – BitBank Nov 01 '17 at 19:23
  • You likely need tags for FILLORDER, PLANARCONFIG and SAMPLEFORMAT too. – Mark Setchell Nov 01 '17 at 19:26
  • @MarkSetchell There are a minimum number of tags to define such a file. As long as the defaults line up correctly with the actual data, many tags can be skipped. The tags you mention can safely be skipped if the data conforms to the default values, but it's still a good idea to include them :) – BitBank Nov 01 '17 at 19:46
  • BitBank: Well I can't share the file in total for reasons stated above. I'm assuming CCITT4 because in the original file, a TIFF header was present which specified a compression value of 4. The trouble is because the original file is proprietary, the software that reads it could be making all sorts of assumptions that I don't have access to. We can test assumptions though as I have the ability to add arbitrary tags to the TIFF, as well as tweak existing ones at will. So suggestions are welcome. Please be specific though as I'm new to all of this. – Andrew Nov 01 '17 at 20:15
  • @MarkSetchell. Can you tell me why you suggest these additional tags? I've written out the ones that are listed in the TIFF spec as required for Bilevel Images. – Andrew Nov 01 '17 at 20:18
  • I found (empirically) that I needed them for another answer, probably this one https://stackoverflow.com/a/43065286/2836621 – Mark Setchell Nov 01 '17 at 20:27
  • @MarkSetchell That's interesting. Unfortunately, because I don't know anything about the compressed data other than what photoshop can tell me after it successfully opens the file, I'm not sure what I'd use in those tags for values. – Andrew Nov 01 '17 at 20:56
  • 1
    I guess you could try saving the file from Photoshop and running `tiffdump` on the output file to see what it used/guessed - though there is no guarantee Photoshop will preserve anything set in your newly minted TIFF. – Mark Setchell Nov 01 '17 at 21:19
  • 1
    It would be helpful if you could locate a page which starts out with a bunch of blank lines and dump the first few bytes of compressed data. Then we can see the bit order, and if it's G4 or G3-2D. – BitBank Nov 01 '17 at 22:14
  • @BitBank. this I can do. Will update with a few more bytes of compressed data shortly. – Andrew Nov 02 '17 at 21:51
  • @Mark Setchell. I already tried saving a new file with photoshop. The resulting tiff is openable by other tools at that point, but it's nothing like the original. Photoshop changes everything... not helpful. – Andrew Nov 02 '17 at 21:57
  • @Andrew Thanks for providing more data. It looks like normal G4. I think that error message about a premature EOL may be spurious. – BitBank Nov 02 '17 at 22:05
  • @Andrew can you find a cover page image or something without private info on it? A complete image would allow me to tell you exactly why your code is failing. – BitBank Nov 03 '17 at 21:32
  • @BitBank... OK, I'll try to find something on Sunday. I'm off until then. – Andrew Nov 04 '17 at 00:43
  • @BitBank. I've updated the post and you have an example file now. Interested to know what you see. Thanks for the help. – Andrew Nov 05 '17 at 20:15
  • @Andrew - I couldn't find a link to download the file you want me to test. A horizontal hex dump is not helpful. Please share a link where it can be downloaded. – BitBank Nov 05 '17 at 20:55
  • @BitBank. Link added didn't realize the hex dump was a problem. – Andrew Nov 05 '17 at 21:31
  • Link didn't work :( – BitBank Nov 05 '17 at 21:37
  • @BitBank... please try again. I put it somewhere else. – Andrew Nov 05 '17 at 23:06
  • Got it. Now it's making me re-examine my TIFF encoder which produces files that OSX Preview complains about also. I'll have a solution for you shortly... – BitBank Nov 05 '17 at 23:24
  • 1
    Update: It appears to have nothing to do with the tags, but instead it has something to do with the data itself. My image library can write 2 TIFFs with identical sets of tags, but different image data and OSX Preview will open one, but not the other. Continuing my investigation... – BitBank Nov 06 '17 at 02:04
  • New info. Gimp (OSX) can open your and my files just fine. The bug appears to be just in OSX Preview. I'm not sure any changes to your TIFF writer will get past what appears to be a bug in Preview. – BitBank Nov 06 '17 at 13:01
  • @BitBank. The original document I posted about does not open with either Gimp or convert with ImageMagick. I tested the one I posted here and confirm that Gimp opens it. Wish I knew what gives. I need a scriptable way to convert these tiffs to PDF files, ideally with open source tools. Incidentally, Gimp "opens" the original tiff (a letter), but renders it as a mostly white page with horizontal streaked "noise" lines across it (total garbage). – Andrew Nov 06 '17 at 18:40

0 Answers0