I am searching for a solution for a long time but couldn't be able to find it. There are more similar qestion-answers but that didn't help me.
Basically
- I have some word documents (xxx.docx) having some images.
- That image is in WMF format (when I am manually checking it) and it basically contains tabular information.
- I need to collect that table.
So the task is reduced to collect the image and get table from text using computer vision.
1 when I am trying to collect the image-- python-docx can't detect that as image , then, I found "aspose.words" library can detect the image (as it is not in an usual image format)as an image object and can write it in EMF format (xxx.emf). [ if anyother way is there please mention ]
[2] Now I have the image (xxx.emf) in a folder. so the next task is to get the content the image contains, which is totally tabular information. Now I can't use this format to read in python.
So, getting emf image and reading is not my target, the target is to get the table from the image in excel. Please help me out in these steps, or please suggest other ways according to the requirement. If anyone needs to get the docx can go to this here in a repo. Thank you.