I am trying to count the number of minus signs in a PDF document.
I have tried opening the document using a binary editor and see that the characters cannot be identified directly.
Anyone know how to do this? Preferably using C#.
I am trying to count the number of minus signs in a PDF document.
I have tried opening the document using a binary editor and see that the characters cannot be identified directly.
Anyone know how to do this? Preferably using C#.
Try with ITextSharp, it helps you to decode a pdf and extract the text in it.
Disclaimer: I work for Atalasoft on PDF technologies. You can use our PdfTextDocument class to do that pretty easily:
int minusCount = 0;
using (PdfTextDocument doc = new PdfTextDocument(pdfStream)) {
using (PdfTextReader reader = doc.GetPdfTextReader()) {
int c = 0;
while ((c = reader.Read()) >= 0) { // return < 0 at end
if ((char)c == '-') minusCount++;
}
}
}
You need to use a library, like this one for example, to convert the pdf doc into something that you can actually parse as text. See this forum post and answer for that library for some quick answers.