0

I am looking to make a PDF parser from scratch using Python (or) any leads to tweaking existing libraries/algorithms.

Anuj Menta
  • 21
  • 3

1 Answers1

1

Here you can find some nice tools for your need, like:

  • pdfrw: Read and write PDF files
  • slate : Active development. Simplifies extracting text from PDF files
  • PyPDF2 : Active development. Split, merge, crop, etc
  • PDFMiner : Active development. Extracting text, images, object coordinates, metadata from PDF file

And there is more in this link.

Gal Dreiman
  • 3,969
  • 2
  • 21
  • 40
  • You will also need a PDF inspector, see http://stackoverflow.com/questions/3549541/best-tool-tool-for-inspecting-pdf-files. Good luck! – Doron Cohen Jun 12 '16 at 07:55