Goal: to get a working version of this tutorial with PDF, via. Visual Studio Code.
I am trying to install camelot
, via. VSCode, using Poetry, but am having dependency problems.
This works in Jupyter Notebooks (bottom of post), but I am attempting to append to an existing .py project.
Code:
import glob
import camelot
import pandas as pd
import multiprocessing.dummy as mp
import ctypes
from ctypes.util import find_library
find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")))
PDF_LIST = glob.glob('../data/gri/reports/*.pdf')
def scrape_tables(pdf_filename):
tables = camelot.read_pdf(pdf_filename)
print("Total tables extracted:", tables.n)
return tables
p = mp.Pool(len(PDF_LIST))
pdf_esg_scraped = p.map(scrape_tables, PDF_LIST)
pip install camelot-py
:
Traceback (most recent call last):
File "scrape_tables.py", line 25, in <module>
import camelot
ModuleNotFoundError: No module named 'camelot'
pip install camelot
:
danielbellhv@PF2DCSXD:/mnt/c/Users/dabell/Documents/GitHub/workers-python/workers/data_simulator/src$ pip install camelot
^[[A^[[A
Requirement already satisfied: camelot in /home/me/.local/lib/python3.8/site-packages (12.6.29)
Requirement already satisfied: Elixir>=0.7.1 in /home/me/.local/lib/python3.8/site-packages (from camelot) (0.7.1)
Requirement already satisfied: SQLAlchemy<0.8.0,>=0.7.7 in /home/me/.local/lib/python3.8/site-packages (from camelot) (0.7.10)
Requirement already satisfied: xlrd==0.7.1 in /home/me/.local/lib/python3.8/site-packages (from camelot) (0.7.1)
Requirement already satisfied: Jinja2>=2.5.5 in /usr/lib/python3/dist-packages (from camelot) (2.10.1)
Requirement already satisfied: xlwt==0.7.2 in /home/me/.local/lib/python3.8/site-packages (from camelot) (0.7.2)
Requirement already satisfied: sqlalchemy-migrate>=0.7.1 in /home/me/.local/lib/python3.8/site-packages (from camelot) (0.11.0)
Requirement already satisfied: chardet>=1.0.1 in /usr/lib/python3/dist-packages (from camelot) (3.0.4)
Requirement already satisfied: decorator in /home/me/.local/lib/python3.8/site-packages (from sqlalchemy-migrate>=0.7.1->camelot) (5.1.0)
Requirement already satisfied: pbr>=1.8 in /home/me/.local/lib/python3.8/site-packages (from sqlalchemy-migrate>=0.7.1->camelot) (5.8.0)
Requirement already satisfied: Tempita>=0.4 in /home/me/.local/lib/python3.8/site-packages (from sqlalchemy-migrate>=0.7.1->camelot) (0.5.2)
Requirement already satisfied: six>=1.7.0 in /usr/lib/python3/dist-packages (from sqlalchemy-migrate>=0.7.1->camelot) (1.14.0)
Requirement already satisfied: sqlparse in /home/me/.local/lib/python3.8/site-packages (from sqlalchemy-migrate>=0.7.1->camelot) (0.4.2)
Attempted Solution:
I got Ghostscript filepath output in Jupyter Notebook.
import ctypes
from ctypes.util import find_library
find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")))
>>> 'C:\\Users\\me\\Anaconda3\\Library\\bin\\gsdll64.dll'
Using this output, I need to "append new location to the PATH variable".
However, this does not work still. Variations I have tried:
PATH=$PATH:C:\Users\me\Anaconda3\Library\bin\gsdll64.dll
PATH=$PATH:C:\\Users\\me\\Anaconda3\\Library\\bin\\gsdll64.dll
PATH=$PATH:'c/Users/me/Anaconda3/Library/bin/gsdll64.dll'
PATH=$PATH:'/mnt/c/Users/me/Anaconda3/Library/bin/gsdll64.dll'
export PATH="$HOME/.poetry/bin:$PATH";C:/Users/me/Anaconda3/Library/bin/gsdll64.dll
bash: C:/Users/me/Anaconda3/Library/bin/gsdll64.dll: No such file or directory
Further, in order to get Poetry to work, I need to have PATH
point to its location:
export PATH="$HOME/.poetry/bin:$PATH"
Can I have multiple PATH
variables?