0

I have successfully converted PDF to text using iTextSharp using the following code:

 var reader = new PdfReader(filePath);    
    for (int page = 1; page <= reader.NumberOfPages; page++)
               {
     ITextExtractionStrategy its = new 
    iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();

                        String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

                        s =Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
    strText = strText + s + Environment.NewLine;
    pdfTextBox.Text = strText;
     }
    reader.Close();

However, certain PDFs, which show text when viewing as PDF, show up as empty(no characters).

Does anyone have any ideas why?

All help would be appreciated

Thanks in advance

  • 1
    A sample of such a PDF might help determine the problem. – Jongware Feb 16 '14 at 00:06
  • 1
    *Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes* - what is your intention in using this? – mkl Feb 16 '14 at 00:28
  • @mkl's question is really important if you are going to process anything above ASCII 127. Please see the answer here for fixing that. http://stackoverflow.com/a/10191879/231316 – Chris Haas Feb 17 '14 at 16:15

0 Answers0