2

I am trying to count the number of minus signs in a PDF document.

I have tried opening the document using a binary editor and see that the characters cannot be identified directly.

Anyone know how to do this? Preferably using C#.

Shiraz Bhaiji
  • 64,065
  • 34
  • 143
  • 252

4 Answers4

4

Try with ITextSharp, it helps you to decode a pdf and extract the text in it.

Felice Pollano
  • 32,832
  • 9
  • 75
  • 115
3

Disclaimer: I work for Atalasoft on PDF technologies. You can use our PdfTextDocument class to do that pretty easily:

int minusCount = 0;
using (PdfTextDocument doc = new PdfTextDocument(pdfStream)) {
    using (PdfTextReader reader = doc.GetPdfTextReader()) {
        int c = 0;
        while ((c = reader.Read()) >= 0) { // return < 0 at end
           if ((char)c == '-') minusCount++;
        }
    }
}
plinth
  • 48,267
  • 11
  • 78
  • 120
2

You need to use a library, like this one for example, to convert the pdf doc into something that you can actually parse as text. See this forum post and answer for that library for some quick answers.

Paul Sasik
  • 79,492
  • 20
  • 149
  • 189
1

Look at this question How to programatically search a PDF document in c#

Community
  • 1
  • 1
boca
  • 2,352
  • 19
  • 21