I am using itext 5.3.4 to extract text from a PDF file. The code I am using to do this is below:
PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
TextExtractionStrategy strategy;
StringBuffer sb = new StringBuffer();
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++)
{
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
sb.append(strategy.getResultantText());
}
String text = sb.toString();
For a particular PDF however, an ë is returned as °. Any idea why this might happen and what can be done about it ? Is it a bug in the itext library or has there been an error in the construction of the PDF ?
Thanks for the assistance.