Extract image data based on coordinates or tessaract and writing the content in docs/docx word file

Question

I have image.want to extract image data with same layout into docx file and in readable form using python.i have tried Applied tessaract on image and converting to pdf using pyteesaract Then converting pdf to word file But i am not able to maintain the layout and format.

Welcome to Stack Overflow. Can you show us the code you have tried so far and the problems you are having? Please read [How to ask](https://stackoverflow.com/help/how-to-ask) and edit your question accordingly so that we can help you. — Francisca Concha-Ramírez, Dec 12 '19 at 17:11

score 0 · Answer 1 · answered Dec 12 '19 at 17:12

0

This question has been answered before in here. You can use the pdf2image library for this issue:

from pdf2image import convert_from_path

pages = convert_from_path('sample.pdf', 400) //400 is the Image quality in DPI (default 200)

pages[0].save("sample.png")

answered Dec 12 '19 at 17:12

Anteino

1,044
7
28

I want to extract the data from image to docs/docx file with layout structure . – Yashu Gupta Dec 12 '19 at 17:23
There are plenty libraries for python that do this, have you tried any of them? For example this one: https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/ – Anteino Dec 12 '19 at 17:30

Extract image data based on coordinates or tessaract and writing the content in docs/docx word file

1 Answers1