I'm using Apache PDFBox to extract text from several PDF files. The files are in Polish language and they contain Polish characters. Unfortunately, when I print the extracted text, I keep getting ? (question marks) instead of those characters.
Asked
Active
Viewed 1,130 times
1 Answers
1
Assuming your extracted text is stored in String s, I am assuming that you are currently using this to print -
System.out.println(s);
I suggest you use this snippet for printing out the polish characters properly-
java.io.PrintStream p = new java.io.PrintStream(System.out,false,"UTF-8");
p.println(s);
This should work and ? will not appear in the printed text.

pyrometer
- 890
- 1
- 8
- 17
-
Actually, I was using System.out and log4j. However, your answer solves my problem! Thanks! – Lukasz Jul 15 '12 at 23:33