Questions tagged [pdftables]

Pdftables is a Python package to extract tables from PDF files.

See also

23 questions
4
votes
4 answers

AttributeError: module 'collections' has no attribute 'Iterable'

I am using the "pdftables" library to extract tables from a pdf. This is my code: import pdftables pg = pdftables.get_pdf_page(open("filename.pdf","rb"),253) print(pg) table = pdftables.page_to_tables(pg) print(table) I am getting this error…
Prassana K
  • 41
  • 1
  • 3
3
votes
2 answers

Extract all tables from PDF in python

I have an PDF and want to extract all tables from that PDF. When I run the code below, I get empty list. import pdftables filepath = 'File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf' with open(filepath, 'rb') as fh: table =…
Neeraj Sharma
  • 174
  • 1
  • 3
  • 14
2
votes
1 answer

Camelot Cannot extract entire table

Im using Camelot to extract table information from a PDF that i have converted from scanned to searchable using ocrmypdf(500dpi). Camelot seems to be able to identify the table and extract most of the data within the table but it seems to be unable…
2
votes
0 answers

Trouble with tabulizer library in r recognizing non-alphanumeric (symbol) characters on a table in a PDF

I am using the tabulizer library in r to capture data from a table located inside a PDF on a public website (https://www.waterboards.ca.gov/sandiego/water_issues/programs/basin_plan/docs/update082812/Chpt_2_2012.pdf). The example table that I am…
2
votes
1 answer

iText 7 prevent cell to split on page break

I'm trying to generate a PDF with table that contains cells with shapes. I override CellRenderer class and inside the new class I draw shapes in DrawableCellRenderer#draw. Sometimes when the table needs to split and the cell has row span I want to…
1
vote
1 answer

How to align multiple tables added to a single table using itextsharp in c#?

I have created a table with 3 columns and another table with 6 columns which is then added to another table to make it into a single table. I want to align the second column of the 3 column table and second columns of 6 column table like this: Can…
ANK
  • 71
  • 1
  • 8
1
vote
2 answers

How do I format/tag an accessible PDF table that spans multiple pages horizontally?

I'm responsible for remediating a PDF that has been generated by a third-party, proprietary system for which I have no access to the layout or design. The goal is to pass the adobe acrobat DC accessibility checker before publication. Some of the…
Glamador
  • 11
  • 2
1
vote
1 answer

Get absolute width from PdfPTable column (iText)

How to get the absolute width of a column from iText when table columns are specified with their relative size ? What I tried I specified 3 columns with their relative width as float like this: PdfPCell cell2; PdfPTable table2 = new PdfPTable(new…
0
votes
0 answers

Facing issue in extracting Tables from PDF with tabula

I am trying to extract multiple tables from the PDF which is throwing me Command '['java', '-Dfile.encoding=UTF8', ERROR link to the pdf https://www.paypalobjects.com/marketing/web/US/en/merchant_fees/US-merchant-fees-24-July-2023.pdf PDF has 42…
0
votes
1 answer

Better Layout Output for PDF Tables Extracted using Camelot

I'm building a python program using Camelot that extracts tables from a PDF (see code below). I am able to successfully execute the code, but I am hitting a road block on how to get a better output result. Specifically, I'm trying to get the code to…
0
votes
1 answer

Flutter Multi Image Pick and inserting in PDF table only last list inserted

I am new to flutter and I am in process of making an app to select multiple images using image_picker package and inserting it into the pdf table. I am able to get the images and make a list based on the number of rows required, however when the pdf…
Bimal
  • 23
  • 2
0
votes
0 answers

How to wrap contents in a table ? .docx to PDF using apache POI

On converting the .docx to PDF using apache POI, the contents in the table are not getting wrapped. enter image description here Following is the code , I am using to convert XWPFDocument document = new XWPFDocument(is); …
abhi
  • 1
  • 1
0
votes
0 answers

How To Circumvent 504 Errors

I am working in ReactJs and one of the main aspects of our project is the ability to upload a scorecard and have all of its results parsed and placed into objects. However, due to the nature of these pdfs that get uploaded, there's a LOT of…
user18899735
0
votes
2 answers

I want to insert the photo taken with the camera into a cell of the table in the pdf. But I am getting the following error code

// I want to insert the photo taken with the camera into a cell of the table in the pdf. But I am getting the following error code. Reloaded 1 of 1223 libraries in 6.711ms. E/flutter (21256): [ERROR:flutter/lib/ui/ui_dart_state.cc(209)] Unhandled…
0
votes
1 answer

Auto-Breakline PdfTable Cells PdfFileWriter c#

Im writing a Programm that retrieves Cutomer Data from a SQLite-File and stores them in a PDF-File in a PdfTable like: PdfContents contentsTable = new PdfContents(page); PdfTable table = new PdfTable(page, contentsTable, ArialNormal,…
1
2