How to make a pdf parser in python from scratch

Question

I am looking to make a PDF parser from scratch using Python (or) any leads to tweaking existing libraries/algorithms.

Tell us what you already tried and what is not working. Please read http://stackoverflow.com/help/how-to-ask — Alexandre Cartapanis, Jun 12 '16 at 08:09

score 1 · Answer 1 · answered Jun 12 '16 at 07:16

1

Here you can find some nice tools for your need, like:

pdfrw: Read and write PDF files
slate : Active development. Simplifies extracting text from PDF files
PyPDF2 : Active development. Split, merge, crop, etc
PDFMiner : Active development. Extracting text, images, object coordinates, metadata from PDF file

And there is more in this link.

answered Jun 12 '16 at 07:16

Gal Dreiman

You will also need a PDF inspector, see http://stackoverflow.com/questions/3549541/best-tool-tool-for-inspecting-pdf-files. Good luck! – Doron Cohen Jun 12 '16 at 07:55

1 Answers1