Questions tagged [python-camelot]

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

image

Official web site

Camelot is a Python library that makes it easy for anyone to extract tabular data from PDF files.

Why Camelot?

  • You are in control. Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
  • Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
  • Each table is a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows.
  • Export to multiple formats, including JSON, Excel and HTML.

See comparison with other PDF table extraction libraries and tools.

197 questions
15
votes
7 answers

AttributeError: module 'camelot' has no attribute 'read_pdf'

I am trying to extract tables from pdf using camelot and I get this attribute error. Could you please help? import camelot import pandas as pd pdf = camelot.read_pdf("Gordian.pdf") AttributeError Traceback (most recent…
Yousra
  • 151
  • 1
  • 1
  • 3
13
votes
5 answers

Camelot: DeprecationError: PdfFileReader is deprecated

I have been using camelot for our project, but since 2 days I got following errorMessage. When trying to run following code snippet: import camelot tables = camelot.read_pdf('C:\\Users\\user\\Downloads\\foo.pdf', pages='1') I get this…
Said Akyuz
  • 180
  • 1
  • 1
  • 11
12
votes
2 answers

Camelot is reading only the first page of the pdf

tables = camelot.read_pdf(r"C:\Users\Ayush ShaZz\Desktop\Code_Python\FoodCaloriesList.pdf") for table in tables: print(table.df) Its reading only the first page. Someone please help me out
Ayush ShaZz
  • 316
  • 3
  • 7
10
votes
0 answers

Same table is extracted twice from a pdf by Camelot-py

I am trying to extract tables from a multiple page PDF file using camelot-py v0.7.3. So far it has been the best pdf reader tool for me. I just needed to read pdf line by line and detect table manually. I tried many other tools such as tabula,…
mk09
  • 313
  • 2
  • 9
8
votes
17 answers

Python-camelot (Error: GhostscriptNotFound while it is installed)

I am trying to extract tabular data from pdf using camelot and I am getting the following error. Code: tables = camelot.read_pdf(file_name) Error: GhostscriptNotFound: Please make sure that Ghostscript is installed and available on the PATH…
Venkatesan R
  • 81
  • 1
  • 1
  • 3
8
votes
3 answers

Python Camelot borderless table extraction issue

I'm trying hard to extract some borderless table as show in the below image which are from pdf files. I have installed python-camelot as shown here and is working fine for bordered tables only. Please find below details: platform -…
Richie
  • 135
  • 1
  • 3
  • 12
7
votes
3 answers

No module named 'camelot.ext'

I have been trying to run Excalibur after install't from pip, it's asked me to install camelot, after that this error pop up, Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return…
Virus
  • 315
  • 1
  • 5
  • 9
6
votes
2 answers

Find PDF Dimensions with Camelot

I am using Camelot to read complete PDFs and extract about 112 attributes from each one. I use table areas to extract the attributes test_variable = camelot.read_pdf(filename, flavor='stream', table_areas=['38, 340 ,50, 328'])…
A.A. F
  • 349
  • 5
  • 16
5
votes
1 answer

Camelot PDF dimensions

I have searched stackoverflow extensively before posting this and have not been able to find anything on camelot page dimensions. There is this question, which suggests using table_region but that does not solve OP's problem or mine. I unfortunately…
Jinx
  • 511
  • 1
  • 3
  • 10
5
votes
2 answers

Python PDF Parsing with Camelot and Extract the Table Title

Camelot is a fantastic Python library to extract the tables from a pdf file as a data frame. However, I'm looking for a solution that also returns the table description text written right above the table. The code I'm using for extracting tables…
Ali Asad
  • 1,235
  • 1
  • 18
  • 33
4
votes
3 answers

Problems to extract table data using camelot without error message

I am trying to extract tables from this pdf link using camelot, however, when a try this follow code: import camelot file = 'relacao_medicamentos_rename_2020.pdf' tables =…
Gabriel Souto
  • 600
  • 7
  • 19
4
votes
2 answers

tabula vs camelot for table extraction from PDF

I need to extract tables from pdf, these tables can be of any type, multiple headers, vertical headers, horizontal header etc. I have implemented the basic use cases for both and found tabula doing a bit better than camelot still not able to detect…
Niranjan Kumar
  • 1,438
  • 1
  • 12
  • 29
4
votes
3 answers

How to find table region for camelot

As mentioned in camelot, we can extract table from particular region like: tables = camelot.read_pdf('table_regions.pdf', table_regions=['170,370,560,270']) But how can I find these regions for my pdf.
4
votes
1 answer

How to get table coordinates using python-camelot?

I am trying to parse some pdf files in order to extract some key information.There is number of tables in each pdf that contains a part of these information. So I tried to use camelot to extract tables and I got good results but I want to extract…
jessy
  • 65
  • 1
  • 5
4
votes
0 answers

How to switch table area coordinates in Python Camelot and Tabula-Py

I have obtained the coordinates of a table bounding box using Camelot, but I need to use tabula-py to extract the table data, as camelot is only extracting the first line in each table cell, even in lattice mode. I have noticed that when defining…
John
  • 81
  • 2
1
2 3
13 14