I was building an API that uses tabula to extract table from a pdf.
I built the API on the windows machine and deployed it on ubuntu 20.
On the windows machine the extraction was flawless, and I was able to perform all the necessary steps. However, after deploying the FastAPI on the Ubuntu server the extraction is incorrect.
I tried providing different parameters, but none works.
The PDF contains a tables with no horizontal and vertical lines.
The extracted table on my windows machine looks something like:
The extracted table on the ubuntu looks like this
My Code looks like this:
area1 = [210,10, 750, 570]
area2 = [130,10, 750, 570]
columns = [75, 250, 300, 370, 440, 530]
tables1 = tabula.read_pdf(filepath, guess=False, lattice=False,
stream=True, multiple_tables=True, area=area1, pages=1, columns=columns)
tables2 = tabula.read_pdf(filepath, guess=False, lattice=False,
stream=True, multiple_tables=True, pages=list(range(2, pages_count+1)), area=area2, columns=columns)
I don't know what's causing this issue, especially for this particular PDF. Even after trying multiple combination of parameters and googling I failed to get the desired result(result in my local Windows Machine).