how to extract tables from pdf using camelot?

Question

I want to extract all tables from pdf using camelot in python 3.

import camelot
# PDF file to extract tables from
file = "./pdf_file/ooo.pdf"
tables = camelot.read_pdf(file)
# number of tables extracted
print("Total tables extracted:", tables.n)
# print the first table as Pandas DataFrame
print(tables[0].df)
# export individually
tables[0].to_csv("./pdf_file/ooo.csv")

and then I get only 1 table from the 1st page of the pdf. how extract the whole tables from the pdf file??

Try SLICEmyPDF in 1 of the answers at https://stackoverflow.com/questions/56017702/how-to-extract-table-from-pdf-in-python/72414309#72414309 — 123456, May 28 '22 at 09:37

score 3 · Answer 1 · answered May 29 '20 at 08:40

3

tables = camelot.read_pdf(file, pages='1-end')

If pages parameter is not specified, Camelot analyzes only the first page. For better explanation, see official documentation.

answered May 29 '20 at 08:40

Stefano Fiorucci - anakin87

3,143
7
26

mike · Answer 2 · 2021-07-07T21:36:34.207

1

In order to extract pdf tables with camelot you have to use the following code. You have to use stream parameter because it is very powerful in order to detect almost all the pdf tables. Also if you have problem with the extraction you have to add as a parameter the row_tol and edge_tol parameters.For example row_tol = 0 and edge_tol=500.

pdf_archive = camelot.read_pdf(file_path, pages="all", flavor="stream")

for page, pdf_table in enumerate(pdf_archive):           
    print(pdf_archive[page].df)

edited Jul 07 '21 at 21:36

answered Jul 07 '21 at 08:39

mike

51
10

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – He3lixxx Jul 07 '21 at 21:06

how to extract tables from pdf using camelot?

2 Answers2