There are pdfs (A) that if we are unable to copy characters using a reader, and pdfs (B) whose characters are copiable but when pasting into a text editor, it becomes all human-unreadable code. (Encryption in this context doesn't mean password protected).
- How to identify these (A) and (B) types of pdfs programmatically, python is preferred?
- Is it possible to extract the text correctly from these files?