21

I have python code using tabula-py for reading PDF to extract the text and then change it to tabular form via tabula-py. But it gives me a warning.

Nov 15, 2017 3:40:23 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for .notdef (9) in font Helvetica

This warning is of tabula-py, And Tabula-py is written in Java. So I cannot simply use -W ignore to suppress the above warning.

Is there any way to remove or suppress the above warning.

neves
  • 33,186
  • 27
  • 159
  • 192
Gammer
  • 5,453
  • 20
  • 78
  • 121
  • I believe this is related to this: https://github.com/tabulapdf/tabula-java/issues/115 – Elisha Jan 07 '18 at 14:51
  • I have used argument silent=True. however, it's not suppressed any warning messages.. does anyone has an answer for this ? – hackwithharsha Jul 09 '19 at 11:17
  • Is there a problem with the PDF file? See https://issues.apache.org/jira/plugins/servlet/mobile#issue/PDFBOX-3296. Can you share a sample PDF that produces this problem? – user650654 Jul 13 '19 at 00:36

2 Answers2

9

tabula-py author is here. Setting silent=True suppresses the tabula-java logs. see also: https://github.com/chezou/tabula-py/blob/e11d6f0ac518810b6d92b60a815e34f32f6bf085/tabula/io.py#L65 https://tabula-py.readthedocs.io/en/latest/tabula.html#tabula.io.build_options

chezou
  • 486
  • 4
  • 12
  • This doesn't seem to work I still get the following for each page: "Picked up _JAVA_OPTIONS: -Djavax.net.ssl.trustStore=C:\Windows\Sun\Java\Deployment\trusted.certs" – Cazforshort Jul 21 '21 at 17:28
3

Tabula provides a built in feature to suppress java warning.

Try silent=True parameter in request:

tabula.read_pdf("/path/to/sample.pdf", pages="all", silent=True)

Documentation Source

Umair Qadir
  • 384
  • 3
  • 7