-1

I am looking for script to extract table text from pdfs using pdfminer. I have tried tabula but I am looking to integrate the normal text and table text to a database. Any ideas on how to implement this are welcome.

Aravind
  • 1
  • 3

2 Answers2

0

maybe you can get some idea from this links

  1. https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f

  2. https://gist.github.com/jmcarp/7105045

samuel161
  • 221
  • 3
  • 2
  • I have tried using pdfminer and tabula. Both are good for specific purposes. Pdfminer extracts all the text in the document where as tabula extracts only table related text. I need to get the table related text in the document. – Aravind Feb 07 '20 at 08:18
0

As many people provided in this link: How to extract tables from a pdf with PDFMiner?

You can use Camelot to extract tables from PDF Miner.

https://camelot-py.readthedocs.io/en/master/user/quickstart.html#read-the-pdf