2

I'm trying to hide a file in a PDF file code. I've already search some information to help me. I've tried to uncompress the pdf using pdftk ( pdftk pdf.pdf output uncompress.pdf uncompress ). Then I tried different things such as :

  • Insert commentary : I put " %TEXT_TO_HIDE " in the uncompress pdf file code.
  • add new object : I put " 0 0 obj << TEXT_TO_HIDE << endobj " in the uncompress pdf file code.
  • modify an existing object

then i compress it using pdftk again

In each case, I obtain a new pdf, which is looking different from the original. It's not corrupted but images have different colors, and some original text are missing.

So, do you know some rules to change a pdf code without anyone notice ?

(PS : Sorry if my english is bad ^^ )

Prygan
  • 95
  • 1
  • 9
  • You should provide additional information, like where you hid the text, what original text was missing, whether it was related in the code to where you hid your message, how different are the images after the compression, etc. I've managed to make the first page blank by hiding my text before the stream starts. If you uncompress a pdf and then compress it again without any modifications, do your images have different colors? I didn't notice a problem with the pdf I tried ([link](http://people.uleth.ca/~roussel/nld/delay.pdf)). – Reti43 Nov 22 '14 at 11:33

1 Answers1

4

You cannot modify a PDF file in a text editor and expect the file to be still compliant in general. PDF is a binary format and you need to read the PDF specification to figure out how to modify it.

That said, there are heaps of places where you can "hide" information in a PDF document, the real question is how much data you want to hide, and to what purpose. The purpose typically links to how secure exactly this needs to be.

As some examples:

1) PDF allows embedding complete files in the actual PDF file. This is not really secure as anyone with decent software can extract these files (but the file itself could still be secured of course).

2) PDF allows adding arbitrary objects anywhere (or almost anywhere) in the file. This is a great way to hide information, but someone with the right tools can browse the object tree (even if the file is compressed) and see what you did.

3) PDF allows adding for example white text on a white background or text behind other objects. Again, there are ways around this for people with the right software.

4) Adobe's PDF spec allows at least 1K of fluff after the %%EOF marker (although ISO 32000 does not). Keep in mind that this is visible to anyone opening the file with a decent text or binary editor. (Thanks Jongware).

In short, you need to define much better what exactly you want to accomplish and how "secure" secure is in your use case.

You should also consider how "robust" the method must be. Should someone be able to save your PDF file with Acrobat for example with the hidden code intact? Some of the above methods may not be robust enough to ensure that with absolute certainty.

David van Driessche
  • 6,602
  • 2
  • 28
  • 41
  • 2
    4) You can write data between objects. 5) Adobe's PDF spec allows *at least* 1K of fluff after the `%%EOF` marker (although [ISO 32000 does not](http://stackoverflow.com/questions/11896858/does-the-eof-in-a-pdf-have-to-appear-within-the-last-1024-bytes-of-the-file)). – Jongware Nov 22 '14 at 14:12
  • Ah, we could have a discussion about your point 4 (though I liked it :-)). According to the PDF specification: "The body of a PDF file shall consist of a sequence of indirect objects representing the contents of a document.". While it may work in many readers, I think this makes your method illegal. You can't even have unreferenced objects: "The table shall contain a one-line entry for each indirect object" says the section about cross-reference tables :-) – David van Driessche Nov 22 '14 at 18:16