4

I want to be able to read the content of pdf files. I need to do that with C on Linux.

The closer i can get to this was here but I think Haru can only create pdf and is not able to read them (not 100% sure).

PS: I only need the plain text from pdf

Community
  • 1
  • 1
Rui Carneiro
  • 5,595
  • 5
  • 33
  • 39

3 Answers3

4

Check out libpoppler. I've never used it work extracting text, just querying PDF attributes. It's pretty easy to use.

eduffy
  • 39,140
  • 13
  • 95
  • 92
2

How well do you need to parse them? Just extracting strings should be relatively easy, fully accurate rendering is harder. Take a look at the source for evince or ghostscript?

This is for C++ but might be a good starting point for understanding PDF structure http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx (sorry wrong link before)

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
0

Another possible, though I've never used it is VersyPDF. It claims to allow you to edit PDFs ... http://versypdf.sybrex-systems-ltd.qarchive.org/