I am trying to extract all tables data from a pdf using tabula_py as: df=tabula.read_ptabula.read_pdf(test_pdf,stream=True,multiple tables=True,pages="all")
The pdf has 3 tables. Second table is on 2 pages. When I try len(df) , it returns 4 instead of 3 The frist row on second table data on the extended page returns as header How to extract the data as same table from the header to the last row