1

The job is to extract the table from the image pdf. I tried using Camelot/ tabula but nothing worked.

Any Suggestions on how can I extract the tables?

Attached the image of the table here :

enter image description here

Camelot/tabula none of them detects the table. enter image description here

Attached the pdf link : https://drive.google.com/file/d/1atUmkNBkOGYFn43ZQreNqSg74XRhFP61/view?usp=sharing

Pravin
  • 241
  • 2
  • 14

1 Answers1

2

You can use Amazon Textract to help you solve this. It allows you to extract key value pairs and tabular data. Here is how you can use it:

from textractor import Textractor
from textractor.data.constants import TextractFeatures
extractor = Textractor(profile_name="default")
document = extractor.analyze_document(
    file_source="./n3Zm0.png",
    features=[TextractFeatures.TABLES, TextractFeatures.FORMS],
)
document.visualize(with_words=False)

visualization

You can export the table data to pandas for example:

document.tables[0].to_pandas()
    0   1   2                3                              4
0   005 XX  1241 2156-001   Rostskyddsvätska    Rust-prev. fluid
1   004 96  2126 2039-130   M6M 30 - -10 spec              Nut
2   003 96  2122 2054-788   Pinnskruv M30x300 -10.9 S      Stud
3   002 96  488 9764-015    Bricka                         Washer
4   001 X   387 4402-002    Styrpinne   Guide pin
5   Item No.        Article No. Moteriol,type,etc Dimensions    Nome of item

and you can get the list of key value pairs here:

document.key_values
[M6M : 30 -10 - spec,
 Pinnskruv : M30x300 -10.9 S,
 Article No. : ,
 Dimensions : ,
 Nome of item : ,
 Moteriol,type,etc : ,
 Item No. : ,
 Part of : Pack spec,
 Tolerances angles for and corner threads radii according chanfers, to : HS 2002 0020.,
 Scote : 1:2,
 Specification : ,
 Weight kg : 225,
 Reg : ,
 Other not indicated tol. : O,
 Description (EngLish) : Slewing rim yard mount.,
 Accepted by qual control : ,
 Accepted for prod by : GK,
 Prod.group : 355,
 Drawn by : A Sedin,
 Description (own Language) : Vändkranslager varvsmont.,
 Design checked by : ,
 Type design/group : ,
 Rev ind : ,
 Year Week : 93 26,
 Sheet : 1,
 Iss by Dept : 3451,
 Drowing checked by : SON,
 No of sh. : 1,
 Year Week : 4,
 Appd : ]
Thomas
  • 676
  • 3
  • 18