Extract all tables from PDF in python

Question

I have an PDF and want to extract all tables from that PDF. When I run the code below, I get empty list.

import pdftables

filepath = 'File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf'
with open(filepath, 'rb') as fh:
    table = pdftables.get_tables(fh)
print(table)

You might want to have a look at https://github.com/camelot-dev/camelot — Martin Thoma, Jul 31 '20 at 09:06
@Neeraj Sharma: Try SLICEmyPDF in 1 of the answers at https://stackoverflow.com/questions/56017702/how-to-extract-table-from-pdf-in-python/72414309#72414309 — 123456, May 29 '22 at 10:23

Michael Dorner · Answer 1 · 2018-09-07T09:22:38.487

2

I assume that the PDF has more than one page? This should work:

from pdftables.pdf_document import PDFDocument
from pdftables.pdftables import page_to_tables

filepath = ...
page_number = ...
with open(filepath, 'rb') as file_object:
    pdf_doc = PDFDocument.from_fileobj(file_object)
    pdf_page = pdf_doc.get_page(pagenumber) 
    tables = page_to_tables(pdf_page)
    print(tables)

You can iterate over several pages, too:

for page_number, page in enumerate(pdf_doc.get_pages()):
    tables = page_to_tables(page)
    print(tables)

edited Sep 07 '18 at 09:22

answered Sep 07 '18 at 09:14

Michael Dorner

17,587
13
87
117

1

i forgot to mention i am using pyhton3, installed pdftables.six and in this package from pdftables.pdf_document import PDFDocument not there instead from pdfminer.pdfdocument import PDFDocument is there which does not have "from_fileobj" – Neeraj Sharma Sep 07 '18 at 11:18
Are you in the correct directory? It sounds that `File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf` is a absolute not relative path. Is it where the file is stored? – Michael Dorner Sep 07 '18 at 12:53
When i tried pdfReader just to see whether it prints all text, but surprisingly it prints everything else other than table data. – Neeraj Sharma Sep 09 '18 at 16:02

score 0 · Answer 2 · answered May 18 '21 at 09:31

0

#install below library to use pdf table, its worked for me

> pip install pdftables.six

answered May 18 '21 at 09:31

madan maram

23
3

Extract all tables from PDF in python

2 Answers2