I'm trying to extract table from some pdf by tabula (python)
i faced with the error as below with some file pdf.
tables = read_pdf(file_path, pages = 'all')
Error from tabula-java:
Error: File does not exist
Traceback (most recent call last):
Input In [71] in <cell line: 1>
tables = read_pdf(file_path, pages = 'all')
File ~\anaconda3\lib\site-packages\tabula\io.py:322 in read_pdf
output = _run(java_options, kwargs, path, encoding)
File ~\anaconda3\lib\site-packages\tabula\io.py:80 in _run
result = subprocess.run(
File ~\anaconda3\lib\subprocess.py:516 in run
raise CalledProcessError(retcode, process.args,
CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\xxx\\anaconda3\\lib\\site-packages\\tabula\\tabula-1.0.5-jar-with-dependencies.jar', '--pages', 'all', '--guess', '--format', 'JSON', 'C:/Users/xx/yyy/Invoice/75211-INV-1180235.PDF']' returned non-zero exit status 1.
It's seem it's the error with java. But i still can extract dataframe from other pdf file perfectly.
i also tryed to extract table from tabula.exe (which will run in browser in address http://127.0.0.1:8080). it works fine with all pdf file (included the file meet error when trying to run by code)
--------------Update print log-----
print(file_path) # 1. print the file-path before using tabula on it
# 2a. the try-except block can catch error output
try:
tables = read_pdf(file_path, pages = 'all')
except Exception as e:
print(e) # 2b. print the error-output or exception
C:/Users/quock/tapetco/Kinh Doanh - Documents/Chứng Từ/Foreign Airports/AEG/Invoice/error/75211-INV-1180235.PDF
Error from tabula-java:
Error: File does not exist
Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\xxx\\anaconda3\\lib\\site-packages\\tabula\\tabula-1.0.5-jar-with-dependencies.jar', '--pages', 'all', '--guess', '--format', 'JSON', 'C:/Users/xx/yyy/Invoice/75211-INV-1180235.PDF']' returned non-zero exit status 1.
i also update the pdf files file: 75211-INV-1180235.pdf produced error file: APAG_20170615.pdf work fine