0

I have a PDF that has pages with 1 column and other pages with 2 or 3 columns.

How do I get correctly read EVERY page?

Using the code below I realized that does not work properly:

PdfReader pdfreader = new PdfReader(nmfile);
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();           

    for (int page = 1; page <= pdfreader.NumberOfPages; page++)
    {
        extractText = PdfTextExtractor.GetTextFromPage(pdfreader, page, strategy);
            extractText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));

        //...
    }
Marco Araujo
  • 165
  • 1
  • 12
  • 2
    You might have more success if you provided a sample PDF representative (in its interna). Currently your question does not add any new information compared to [your former question](http://stackoverflow.com/questions/22046730/itextsharp-problems-reading-pdfs-with-1-column-page1-and-2-columns-page2). – mkl Mar 07 '14 at 15:19
  • Also, please see my comments on your [first question with the same code](http://stackoverflow.com/q/22022559/231316). PDFs don't have "columns", just text that happens to be in a format you consider to look like a column. What @mkl is trying to tell you is that you need to know the exact X,Y coordinates of what you consider to be a column. iTextSharp can't automatically help you with that. Once you've decided those X,Y coordinated there are some filters that you can pass those two to get the text constrained to that region. – Chris Haas Mar 07 '14 at 15:51

0 Answers0