18

A .py program works but the exact same code, when exposed as API, doesn't work.

The code reads the pdf with Tabula and provides the table content as a output.

I've tried :

import tabula
df = tabula.read_pdf("my_pdf")
print(df)

and

from tabula import wrapper
df = wrapper.read_pdf("my_pdf")
print(df)

I've installed tabula-py (not tabula) on AWS EC2 running Ubuntu.

More than read_pdf, I actually want to convert to CSV and give the output. But that doesn't work as well. I get the same no-attribute error i.e. module 'tabula' has no attribute 'convert_into.

The .py file and the API file (.py as well) are in the same directory and are accessed with the same user.

Any help will be highly appreciated.

EDIT : I tried to run the same python file from the API as OS command (os.system("python3 /home/ubuntu/flaskapp/tabler.py")). But it didn't work as well.

Sukhi
  • 13,261
  • 7
  • 36
  • 53
  • what does your ```pip freeze``` show? – Chathuranga Feb 24 '20 at 14:32
  • The details are here - https://pastebin.com/yGBgr5jM. Note, The same API file has more functionalities to expose. So, you'll find more pip components than tabula. – Sukhi Feb 24 '20 at 14:39
  • I tried to run the same python file from the API as OS command (`os.system("python3 /home/ubuntu/flaskapp/tabler.py")`). But it didn't work as well. – Sukhi Feb 24 '20 at 14:44
  • Have you named one of your scripts `tabula.py`, by any chance? `import` might pick that up in preference to the installed module. Or do `import tabula; print(dir(tabula))` to see exactly what names it *is* defining. – jasonharper Feb 24 '20 at 14:57
  • No. None of my file is named as “tabula”. – Sukhi Feb 24 '20 at 15:08
  • according to your pip freeze there is not tabula nor tabula-py. but it cannot be the case since you dont get the error in import statement. are you in a virtualenv? – Chathuranga Feb 24 '20 at 15:33
  • Thanks Chathurana. Appreciate your help. It's not a virtual env. The details are here - https://www.datasciencebytes.com/bytes/2015/02/24/running-a-flask-app-on-aws-ec2/ Besides, I can run the standalone py file to run tabula. – Sukhi Feb 24 '20 at 15:36
  • can you ssh and see if the same works in python shell? – Chathuranga Feb 24 '20 at 15:39
  • I don't know how to check on Python shell. I'll find out. But the program (not API) works when I ssh and run it from Ubuntu terminal prompt. – Sukhi Feb 24 '20 at 16:33
  • There must have been some mistake in the pastebin link. https://pastebin.com/FwGbNL9H is the right one which shows tabula-py 2.0.4 is installed. – Sukhi Feb 24 '20 at 16:48
  • Did you solve this problem? I am having same problem – shekwo Apr 16 '20 at 15:30
  • Yes. It always worked well on my machine as an installed component (not via API). So, I created a docker container and put it in AWS ECS. The API works well from there. – Sukhi Apr 16 '20 at 16:58

8 Answers8

31

make sure that you installed tabula-py not just tabula use

!pip install tabula-py

and to import it use

from tabula.io import read_pdf
yasmine
  • 434
  • 4
  • 6
8

There is actually an entry in the FAQ about this issue specifically :

If you’ve installed tabula, it will be conflict the namespace. You should install tabula-py after removing tabula.

Although using read_csv() from tabula.io worked, as suggested by other answers, I was also able to use tabula.read_csv() after having removed tabula and reinstalled tabula-py (using pip install --force-reinstall tabula-py).

Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
5

If you accidentally installed tabula before installing tabula-py, they'll conflict in the namespace (even after uninstalling tabula).

Uninstall tabula-py and re-install it. That did the trick for me.

Jeff Martin
  • 317
  • 4
  • 7
3

There is something off with tabula package. I looked inside and there is no __init__.py. You can do:

from tabula.io import read_pdf

it worked for me.

2

from tabula import read_pdf didn't work for me. I've replaced tabula.read_pdf() by tabula.io.read_pdf() to make it work.

Ruli
  • 2,592
  • 12
  • 30
  • 40
0

if you are working in colab then u have to install it by command

!pip install -q tabula-py import tabula

and for using function like read_pdf and convert_into we have to use dfs = tabula.io.read_pdf(path, stream=True)

Note-tabula.io (should be used to access these function in colab) have a good day and long live Data science community.

  • Please check existing answers before posting yours to make sure your answer does not duplicate other answers. You are providing the same commands as in the most-upvoted answer right now (the `-q` option just disables messages, i.e., it is short-form of `--quiet`). – AlexK Jun 18 '22 at 06:13
  • @AlexK sir i have provided detail working for colab,more elaborative and the ans is well tried and then posted – NIKHIL gla Jun 19 '22 at 14:11
-1

try

from tabula import read_pdf

I had the same problem, and this fixed it.

stygarfield
  • 107
  • 9
-1

It is working this way:

import tabula # just this here!

#declare the path of your file
file_path = "/path/to/pdf_file/data.pdf"

#Convert your file
df = tabula.io.read_pdf(file_path)

Thai is all!

YoYoYo
  • 439
  • 2
  • 11