PDF Text not Displaying when converting using itextsharp

Asked Feb 15 '14 at 23:34

Active Feb 15 '14 at 23:34

Viewed 265 times

I have successfully converted PDF to text using iTextSharp using the following code:

 var reader = new PdfReader(filePath);    
    for (int page = 1; page <= reader.NumberOfPages; page++)
               {
     ITextExtractionStrategy its = new 
    iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();

                        String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

                        s =Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
    strText = strText + s + Environment.NewLine;
    pdfTextBox.Text = strText;
     }
    reader.Close();

However, certain PDFs, which show text when viewing as PDF, show up as empty(no characters).

Does anyone have any ideas why?

All help would be appreciated

Thanks in advance

asked Feb 15 '14 at 23:34

Bill Gerold

1

A sample of such a PDF might help determine the problem. – Jongware Feb 16 '14 at 00:06
1

*Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes* - what is your intention in using this? – mkl Feb 16 '14 at 00:28
@mkl's question is really important if you are going to process anything above ASCII 127. Please see the answer here for fixing that. http://stackoverflow.com/a/10191879/231316 – Chris Haas Feb 17 '14 at 16:15

PDF Text not Displaying when converting using itextsharp

0 Answers0