1

I have a pdf file which i am Reading as string page by page.Now from page 4 onwards my pdf contains billing information.These Billing information are under section for Example :- say one is Local Billing information and other is STD billing information etc.Now as per my requirement if user wants to validate Local Billing information my code should read all the Local Billing data and validate it,in case any data(row) validation gets failed it should highlight that row of the PDF File.

Here is my Code in c#

public static string ReadPdfFile(string fileName)
    {
        StringBuilder text = new StringBuilder();

        if (File.Exists(fileName))
        {
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 2; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }
            pdfReader.Close();
        }
        return text.ToString();
    }
}
Stephan
  • 4,187
  • 1
  • 21
  • 33
Adi
  • 1,395
  • 11
  • 37
  • 61
  • 1
    Whether or not your requirement can be met in an easy way, depends on the nature of your PDFs. It could be easy (but NOT using the `SimpleTextExtractionStrategy`, but it could also be extremely difficult (in general, regardless of the fact that you're using iTextSharp). This is actually a consultancy question: the answer involves several days of work. I doubt it if such questions are desirable on SO. – Bruno Lowagie Feb 18 '14 at 10:57

0 Answers0