I am looking to make a PDF parser from scratch using Python (or) any leads to tweaking existing libraries/algorithms.
Asked
Active
Viewed 2,022 times
0
-
Tell us what you already tried and what is not working. Please read http://stackoverflow.com/help/how-to-ask – Alexandre Cartapanis Jun 12 '16 at 08:09
1 Answers
1
Here you can find some nice tools for your need, like:
- pdfrw: Read and write PDF files
- slate : Active development. Simplifies extracting text from PDF files
- PyPDF2 : Active development. Split, merge, crop, etc
- PDFMiner : Active development. Extracting text, images, object coordinates, metadata from PDF file
And there is more in this link.

Gal Dreiman
- 3,969
- 2
- 21
- 40
-
You will also need a PDF inspector, see http://stackoverflow.com/questions/3549541/best-tool-tool-for-inspecting-pdf-files. Good luck! – Doron Cohen Jun 12 '16 at 07:55