How to parse a pdf file and extract tables with their titles using python-camelot?

Asked Sep 11 '19 at 16:20

Active Feb 17 '21 at 02:04

Viewed 978 times

I am trying to parse some pdf files in order to extract some key information.There is number of tables in each pdf that contains a part of these information. So I tried to use camelot to extract tables and I got good results but I want to extract the title of each table because I want to do a mapping for each table with its title. Can anyone tell me how to extract the title of table from pdf using python?

asked Sep 11 '19 at 16:20

jessy

Currently, Camelot can't extract table titles (https://github.com/atlanhq/camelot/issues/247). If you post the PDF, we can analyze better the problem. – Stefano Fiorucci - anakin87 Sep 12 '19 at 06:26
@Anakin87 thanks, it is not just one pdf with defined format but a number of pdf files related to the financial field.I thought about using OCR or also converting the file to HTML hoping that tables can be detected with the
in HTML – jessy Sep 13 '19 at 15:21
1

Does this answer your question? [Python PDF Parsing with Camelot and Extract the Table Title](https://stackoverflow.com/questions/58185404/python-pdf-parsing-with-camelot-and-extract-the-table-title) – Brian Wylie Feb 17 '21 at 02:04

How to parse a pdf file and extract tables with their titles using python-camelot?

0 Answers0

Linked