0

I am new to python and am working on setting up some automation for my job in python and part of that is pulling data from tables in pdf files. Short version is that no matter how I try and what I have looked up I cannot get Tabula-Py to look at the path to java on my portable drive.

I am using a portable IDE set-up since I do not have admin privilege's on my work computer.

Tabula-Py throws the usual cannot find Java make sure it is in your PATH error message. I am using Python Portable and jPortable installed to a common directory with Spyder portable as the IDE. I have run pip install and uninstall on both Tabula and Tabula-Py multiple times. I have also run import sys for sys.path.append to add the filepath to my Java bin.

Code:

import pandas as pd
import numpy
import tabula
import sys
sys.path.append('E:\CommonFiles\Java\bin')


df = tabula.read_pdf('E:\CommonFiles\Python-Portable-3.9.6\Scripts\Sample.pdf', pages='all')

Error Message:

runfile('E:/CommonFiles/Python-Portable-3.9.6/Scripts/untitled01.py', wdir='E:/CommonFiles/Python-Portable-3.9.6/Scripts')
Traceback (most recent call last):

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 80, in _run
    result = subprocess.run(

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in __init__
    super(SubprocessPopen, self).__init__(*args, **kwargs)

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

FileNotFoundError: [WinError 2] The system cannot find the file specified


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "E:\CommonFiles\Python-Portable-3.9.6\Scripts\untitled01.py", line 15, in <module>
    df = tabula.read_pdf('E:\CommonFiles\Python-Portable-3.9.6\Scripts\Sample.pdf', pages='all')

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)

  File "E:\CommonFiles\Python-Portable-3.9.6\apps\lib\site-packages\tabula\io.py", line 91, in _run
    raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)

JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`

I have also attempted to use camelot with a similar frustration over the ghostscript.dll.

Finally I looked into pdfplumber but had even less luck there getting it to find the tables let alone do anything with them.

I am sure this is doable but my google-fu is failing me currently and have spent the better part of 3 days looking into this with no solution I could find through Google, StackOverflow, Reddit, etc.

David Bush
  • 13
  • 2

1 Answers1

0

I had the same issue, and the solution I found is by using portable Java and registering it in the user environment path. This explains how to install java from the EXE installer https://stackoverflow.com/a/6571736/11322275 Then, register where you saved the java folder to the user environment path as explained here https://stackoverflow.com/a/67844469/11322275 Make sure you can call java -version on your command prompt once you've done the above

moju
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 02 '22 at 23:02