0

I am trying to extract tables from PDF and write them to Excel using python tabula-py. Here is the code.

tabula.convert_into("input.pdf", "output.xlsx", output_format="xlsx", multiple_tables=True, stream=True, spreadsheets=True, pages='all')

Everything is ok I get the output.xlsx but the problem is that font sizes/styles are not kept as they are in PDF. Is there any way to keep fonts sizes/styles?

1 Answers1

1

No. By default, tabula-py forces to convert the PDF into CSV, not xlsx. tabula-java, which is called by tabula-py, doesn't have a way to convert into XLSX as well.

chezou
  • 486
  • 4
  • 12