How to extract reconstructed table data's corresponding coordinates on the page?

Question

Using PaddleOCR, I'm able to extract the tables from the page into an excel file. It also generates a res file which is of the following format:

{"type": "text", "bbox": [46, 292, 1469, 319], "res": [], "img_idx": 0}
{"type": "text", "bbox": [44, 213, 1073, 248], "res": [], "img_idx": 0}
{"type": "table", "bbox": [37, 363, 2145, 1191], 
"res": {"cell_bbox": [[17.492332458496094, 29.683828353881836, 152.4942626953125, 70.82095336914062]....
        "boxes": [[625.0, 0.0, 781.0, 31.0]............, 
        "rec_res": [["Gross", 0.9994363188743591], ["Deviations", 0.9983635544776917]......
         "html": "<html><body><table><thead><tr><td rowspan=\"2\">Description</td><td colspan=\"4\">.....

I'm interested in getting the values for each column along with their coordinates on the page so I can mark them etc. Using the "html" property I can extract each column, but how do I get the corresponding coordinates of each for those values? Is there another way using PaddleOCR to do this?

Something like '<tr>': ['Text', [20,20,20,10]..], ['Text2', [..]..]

How to extract reconstructed table data's corresponding coordinates on the page?

0 Answers0