I have PDF files which are result of merge of multiple tiff files together. So every page is actually ImageXObject. Every page looks like this if opened by PDF Walker
7 0 obj
<<
/Type /XObject
/Subtype /Image
/Width 1653
/Height 2339
/BitsPerComponent 4
/ColorSpace [ /Indexed /DeviceRGB 15 8 0 R ]
/DecodeParms [ <<
/Columns 1653
>> ]
/Filter [ /FlateDecode ]
/Length 219260
>>
I found the PDF has wrongly generated related content stream - it is missing few last lines of data. If I try to open such PDF in Acrobat reader there is shown error message Insufficient data for an image. It can be resolved if the length is lowered by e.g. 10 (defined constant).
Text /Height 2339
should be updated to e.g. /Height 2330
. Which will be sufficient to overcome the issue.
If something like that should be done in text file I would use regular expressions to find the page heights and update them as need. But I am not sure how best to handle the update in binary file.
Note: I am not asking about how to read/write binary files. PDF files can be loaded in memory e.g. as byte array. Questions is more about the approach how to handle the problem efficiently. Without need of looping through the array and comparing every six bytes for sequence /Height
and then looking for next couple of bytes which should represent number of pixels etc.