0

I am trying to duplicate the solution shown here but no luck.

Basically Ivan Kuckir managed to decompress a PDF1.6 xref stream by first decrypting it and then decompressing. This stream like mine belongs to an encrypted PDF file. One issue here however, is that the PDF 1.6 spec states on p.83 that "The cross-reference stream must NOT be encrypted, nor may any strings appearing in the cross-reference stream dictionary. It must not have a Filter entry that specifies a Crypt filter (see 3.3.9, “Crypt Filter”)." What I understand from this is that, like cross ref tables before them, xref streams must not be encrypted.

When I try to inflate the stream the zlib dll crashes. It also crashes when I decrypt first and then inflate... Has anyone managed to duplicate Ivan Kuckir's solution? Thanks.

P.S. I tried to ask the question in the above thread but for some reason it was deleted by the admin...

This is the link to the object: https://drive.google.com/file/d/1DwOf3zarg9p_B8DNZ2gZdaBr43NKDWR3/view?usp=sharing I replaced the stream charecters with a hex string for unrisky pasting

C MGL
  • 1
  • 1
  • Can you share the PDF in question, so we can easily tell what's going on? Also you use the PDF 1.6 reference - PDF has been an ISO standard for more than a dozen years now, so one generally should argue using the ISO standard. Also the Adobe references and the ISO standards have been created with compatibility in mind, so ISO 32000 also applies to PDF-1.6 files. The section references in [@gettalong's answer](https://stackoverflow.com/a/72082491/1729265) also denote sections in the ISO standard... – mkl May 02 '22 at 07:05

1 Answers1

0

So, as you read in the spec, xref streams are not encrypted. So you don't need to decrypt any strings in the xref stream dictionary nor the stream itself. What you need to take into account are the /Filter and /DecodeParams entries when decoding the stream.

Most of the time an xref stream uses a /Flate decode filter together with parameters that allow for better compression due to the way an xref stream is structured. So have a look at sections 7.4.4.1 and 7.4.4.4 of the PDF specification.

gettalong
  • 735
  • 3
  • 10
  • Hello and thank You all for your answers. I am showing below the object in question. I have replaced the stream chrs with a hex string that is easier to paste here. What I have noticed is that the header 68DE complies as a zlib header because it is divisible by 31. Please see also here https://groups.google.com/g/comp.compression/c/_y2Wwn_Vq_E/m/EymIVcQ52cEJ where it is stated that a 68DE header is valid but rare – C MGL May 02 '22 at 15:03
  • This is the link to the object in question: https://drive.google.com/file/d/1DwOf3zarg9p_B8DNZ2gZdaBr43NKDWR3/view?usp=sharing – C MGL May 02 '22 at 15:18
  • @CMGL Thanks for the object. As you can see it uses `FlateDecode` with `DecodeParams`. The value of `/Predictor` is 12, meaning PNG encoding using the "Up" filter method (see https://www.w3.org/TR/PNG/#9Filters). So to decode the stream, first convert the hexstring into a binary string, then decode that binary string using the Flate method (see PDF spec 7.4.4), then apply the PNG filter method as specified by the predictor. The result is the completely decoded and filtered xref stream. Finally, see PDF spec 7.5.8 for information on the values of the decoded bytes. – gettalong May 03 '22 at 18:40
  • Thanks a million for your great help. I went back and did a detailed check on my code. The problem was that I did not realise that I should use 5 instead of 4 columns when decoding the predictor function because prediction may vary by row. When I did it it worked out perfectly. Thank You once again for pointing me to the right direction. All the Best! – C MGL May 03 '22 at 21:22