I have this PDF file, which is in Greek. A known problem occurs when trying to copy and paste text from it, resulting in slight gibberish. The reason I say slight instead of total, is that while the pasted output does not make sense in Greek, it is comprised of valid greek characters. Also, an interesting aspect to the problem is that not all characters are mapped wrong. For example, if you compare this original strip of text
ΕΞ. ΕΠΕΙΓΟΝ – ΑΜΕΣΗ ΕΦΑΡΜΟΓΗ
ΝΑ ΣΤΑΛΕΙ ΚΑΙ ΜΕ Ε-ΜΑIL
with the pasted one from the PDF:
ΔΞ. ΔΠΔΙΓΟΝ – ΑΜΔΗ ΔΦΑΡΜΟΓΗ
ΝΑ ΣΑΛΔΙ ΚΑΙ ΜΔ Δ-ΜΑIL
you will notice that some of the characters are correctly pasted, while others are not. It might also be worthwhile to mention that the wrong characters are reflexively mapped wrong, e.g. Ε becomes Δ and vice-versa.
When I open the PDF using e.g. Adobe, and print it using a PDF writer, in this case CutePDF, the output when copying and pasting is correct!
Given the above, my questions are the following:
- What is the root cause of this behavior?
- How would I go about integrating a solution into a java-based workflow for randomly imported PDF files?
EDIT: a few typos