I'm curious how does PDF securing work? I can lock PDF file so system can't recognize text and manipulate with PDF file. Everything I found was about "how to lock/unlock" however nothing about "how does it work". Is there anyone who could explain it to me? Thx
-
This is an extremely broad question. PDF can have any number of encryption algorithms that can be based on standards or proprietary, that need plugins for Adobe Reader or not. You're going to need to be more specific. – joelgeraci Mar 02 '17 at 17:56
-
@joelgeraci Sorry for not being specific. I mean lock on text recognition or manipulation with PDF file. There should be nothing about cryptography imho just some trick. – Majzlik Mar 03 '17 at 11:50
-
If the file can be printed or even viewed, you might as well have no security at all... print and scan or take a picture of the screen, OCR, and you have the text. – joelgeraci Mar 03 '17 at 15:35
1 Answers
The OP clarified in a comment
I mean lock on text recognition or manipulation with PDF file. There should be nothing about cryptography imho just some trick.
There are some options, among them:
You can render the text as a bitmap and include that bitmap in the PDF
-> no text information.
Or you can embed the font in question using a non-standard encoding without using standard glyph names
-> text information in an unknown encoding.
E.g. cf. the PDF analysed in this answer.
A special case: make the encoding wrong only for a few characters, maybe just one, probably a digit. This way an unalert person thinks everything was extracted ok, and only when the data is to be used, the errors start screwing things up, something which especially in case of wrong digits is hard to fix. E.g. cf. the PDF analysed in this answer.
Or you can put text in structures where text extraction software or copy&paste routines usually don't look, like creating a large pattern tile containing the text for some text area and filling the area with the matching pattern color.
-> text information present but not seen by most extractors.
E.g. cf. this answer; the technique here is used to make the text of a watermark non-extractable.
Or you can put extra text all over the page but make it invisible, e.g. under images, drawn in rendering mode 3 (invisible), located in some disabled optional content group (layer), ... Text extractors often do not check whether the text they extract actually is visible.
-> text information present but polluted by garbage text bits.
...