Searching for a keyword in PDF using iTextSharp 7

Question

I am trying to search for a keyword within PDF file using C# and iTextSharp.

So I have come across this piece of code:

public List<int> ReadPdfFile(string fileName, String searthText)
        {
            List<int> pages = new List<int>();
            if (File.Exists(fileName))
            { 
                PdfReader pdfReader = new PdfReader(fileName);
                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                    if (currentPageText.Contains(searthText))
                    {
                        pages.Add(page);
                    }
                }
                pdfReader.Close();
            }
            return pages;
        }

But it says that PdfReader does not contain the definition for NumberOfPages. Is there any other way I can get number of pages in PDF file?

Check this using pdf clown https://stackoverflow.com/questions/56162692/read-specific-value-based-on-label-name-from-pdf-in-c-sharp/57999452#57999452 — Maytham Fahmi, Oct 17 '19 at 11:50

mkl · Accepted Answer · 2019-10-17T12:08:29.790

The piece of code you found is for iText 5.5.x. iText 7 has a fundamentally changed API, so your NumberOfPages problem is not the only problem you'll have to deal with.

Nonetheless: To get the number of pages in iText 7, you now use the PdfDocument method GetNumberOfPages instead of the former PdfReader property NumberOfPages.

And more generally, a port of your method to iText 7 might look like this:

public List<int> ReadPdfFile(string fileName, String searthText)
{
    List<int> pages = new List<int>();
    if (File.Exists(fileName))
    {
        using (PdfReader pdfReader = new PdfReader(fileName))
        using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
        {
            for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                string currentPageText = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(page), strategy);
                if (currentPageText.Contains(searthText))
                {
                    pages.Add(page);
                }
            }
        }
    }
    return pages;
}

Francois Borgies · Answer 2 · 2019-10-17T09:33:21.517

0

You can change this

pdfReader.NumberOfPages

by

getNumberOfPdfPages(fileName)

And the method (reference) :

public int getNumberOfPdfPages(string fileName)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

But it seems weird that the NumberOfPages is not recognized... Are your sure about your using ?

edited Oct 17 '19 at 09:33

answered Oct 17 '19 at 09:27

Francois Borgies

2,378
31
38

Thank you! Well this is what i am using : ``` using iText.Kernel.Pdf.Canvas.Parser; using iText.Kernel.Pdf.Canvas.Parser.Listener; using iText.Kernel.Pdf; using iText.Kernel.Pdf.Canvas.Parser.Filter; using iText.Pdfa; using iText.Signatures; ``` I also have a problem with: `string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);` For all 3 arguments it states that it can't convert it from one type to another... – Milan Sekulic Oct 17 '19 at 09:38

Searching for a keyword in PDF using iTextSharp 7

2 Answers2