Find PDF Dimensions with Camelot

Question

I am using Camelot to read complete PDFs and extract about 112 attributes from each one.

I use table areas to extract the attributes

 test_variable = camelot.read_pdf(filename, flavor='stream', 
                 table_areas=['38, 340 ,50, 328'])

The issue is the table area is not constant for the same attribute across all documents. Sometimes I would find the same attribute a few pixels down in x or y-coordinates i another document.

 test_variable = camelot.read_pdf(filename, flavor='stream', 
                 table_areas=['38,350,50,338'])

Is there a way to get the exact attribute from the same area regardless of extraction of any document?

score 2 · Answer 1 · answered Jan 14 '19 at 11:07

Maybe the option table_regions (introduced in 0.7) can help you.

https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-table-regions

"When table_regions is specified, Camelot will only analyze the specified regions to look for tables."

You can define a larger table_regions area and Camelot will search for tables in this area.

score 2 · Answer 2 · answered Dec 03 '19 at 21:04

2

Camelot uses opencv's coordinate system, and the dimensions can be obtained using opencv's .shape

See source code for camelot image processing here and opencv's documentation here

answered Dec 03 '19 at 21:04

Jinx

511
1
3
10

Find PDF Dimensions with Camelot

2 Answers2

Linked