I'm using ITextSharp with the follow command to extract text from pdf and it was working well. However today I received an different pdf and that resulted in extracting alot of ? ? ? ?.
Does anybody knows why that's happening? Is there anyway to at least check if the pdf can't be extracted?
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(arquivo);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
return text.ToString();