tabula_py issue How to extract pdf table data spread in multiple pages

Asked Jun 11 '20 at 19:53

Active Jun 11 '20 at 19:53

Viewed 170 times

I am trying to extract all tables data from a pdf using tabula_py as: df=tabula.read_ptabula.read_pdf(test_pdf,stream=True,multiple tables=True,pages="all")

The pdf has 3 tables. Second table is on 2 pages. When I try len(df) , it returns 4 instead of 3 The frist row on second table data on the extended page returns as header How to extract the data as same table from the header to the last row

asked Jun 11 '20 at 19:53

Sharon

1

hello. did you find a solution? – Charalamm May 17 '21 at 19:15
was there a solution to this? – Standin.Wolf Aug 25 '23 at 16:42

tabula_py issue How to extract pdf table data spread in multiple pages

0 Answers0