How to read Text from pdf file in c#.net web application

Question

I am working on one project where there is a functionality need to implement with PDF

I want to read the text of PDF file in my c#.net project.

Can anyone know what is the way to do so?

score 3 · Accepted Answer · edited May 23 '17 at 12:07

3

Hve a look to the following links:

How to read pdf files using C# .NET

and

Reading PDF in C#

Hopefully they can guide you to the correct direction.

edited May 23 '17 at 12:07

Community

1
1

answered Mar 05 '12 at 08:40

Aristotelis Kostopoulos

835
4
15

score 1 · Answer 2 · answered Mar 05 '12 at 08:38

1

Perhaps pdfLib can be used.

From pdfLib homepage

PDFlib TET PDF IFilter (Enterprise PDF Search on Windows) extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows.

answered Mar 05 '12 at 08:38

Niels

1,026
9
17

You are also write, would like to give points on your answer. Thanks – amit patel Mar 05 '12 at 10:00

score 1 · Answer 3 · answered Mar 05 '12 at 08:38

1

Try this library, very easy to use and exactly what you need:

http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET

answered Mar 05 '12 at 08:38

Alex

5,971
11
42
80

Thanks, right answer. How ever I got solution from fist link as well – amit patel Mar 05 '12 at 10:01

score 1 · Answer 4 · answered Mar 05 '12 at 08:43

I would much like to use getText() method of PdfTextStripper.To implement this, you can have look over following url:

http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx

http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C

score 0 · Answer 5 · answered Mar 05 '12 at 08:41

0

Short answer, unless you are generating the pdf and are doing it correctly, no.

Pdf files are generated in a manner similar to what is sent to a printer. Not all text is readable in them, and the information about the text can be stored arbitrarily. Also some programs might save the text in vector or bitmap format.

answered Mar 05 '12 at 08:41

linkerro

5,318
3
25
29

Links posted are definitely useful but yes you correctly said not all text can be read. I have few PDF's which have 'vector text' in them, is there any library which reads those? – Sujit Singh Apr 22 '18 at 06:06
You would need to raster the pdf (turn it into images) then use some OCR software to read the text of the image. This will not be very reliable and will probably not scale. In short, not really. – linkerro May 02 '18 at 12:37

How to read Text from pdf file in c#.net web application

5 Answers5