I use a couple of different programs to convert pdf files to txt files. Usually, this results in good-looking text. Sometimes, it doesn't. I have a set of files that convert in the following way:
Text I can read: Your Account Summary
Copy, paste into Notepad++:
Ghostscript: seems to be a garbage file. Full of xEF
, xBF
characters.
xPdf: gives me a file full of stuff like this: Ç+6 3 É+C ÌÍÍÌ; ÆÁÅ ÅAÁ
It seems like the copy-paste method is closest to English language because it seems that each of those characters represents an alphabet character. SO == Y, SI == o, STX == u, etc.
I would like to convert these pdf files to English text.