Unable to paste text in readable format from a PDF

Question

I have a PDF document with the following sample text (screenshot) -

But when I copy and paste it to either word or other text editors all I see is the weird characters :

    

I am not quite sure why does it giving me weird square boxes instead of pasting the clear human-readable letters (just like the screenshot). Can someone help me how can I get rid of this issue ? Or at least what shall I do to identify the root cause of this strange issue ?

Apparently your pdf misses the entries required for text extraction. Displaying glyphs is possible without any hint concerning a unicode code point representing that glyph as a character. — mkl, Aug 15 '20 at 08:42
@mkl - If I understood correctly, so this can't be fixed any more ? — Panchu, Aug 16 '20 at 03:18
Depending on the number of distinct font objects in the pdf, you may attempt to inject information in that regard, compare [this answer](https://stackoverflow.com/a/39644941/1729265). And another option is OCR... — mkl, Aug 16 '20 at 08:26
Thanks for your suggestion @mkl. I went with the OCR approach and it resolved by issue. — Panchu, Aug 26 '20 at 15:45

score 1 · Answer 1 · answered Aug 26 '20 at 18:08

================== Workaround found ==================

I tried converting the document's corrupted unicode to a standard ANSCI unicode formats. But most of the online services couldn't recognize these garbage/weird characters.
This issue could be resolved by some programming, but I don't want to invest time with the programming approach and preferred on the fly approach.
Finally, as suggested by the user 'mkl', converting this document by using the OCR services like "Sedja"/ "Adobe OCR" resolved by issue.

Unable to paste text in readable format from a PDF

1 Answers1