I am working on one project where there is a functionality need to implement with PDF
I want to read the text of PDF file in my c#.net project.
Can anyone know what is the way to do so?
I am working on one project where there is a functionality need to implement with PDF
I want to read the text of PDF file in my c#.net project.
Can anyone know what is the way to do so?
Hve a look to the following links:
How to read pdf files using C# .NET
and
Hopefully they can guide you to the correct direction.
Perhaps pdfLib can be used.
From pdfLib homepage
PDFlib TET PDF IFilter (Enterprise PDF Search on Windows) extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows.
Try this library, very easy to use and exactly what you need:
http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET
I would much like to use getText() method of PdfTextStripper.To implement this, you can have look over following url:
http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C
Short answer, unless you are generating the pdf and are doing it correctly, no.
Pdf files are generated in a manner similar to what is sent to a printer. Not all text is readable in them, and the information about the text can be stored arbitrarily. Also some programs might save the text in vector or bitmap format.