I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.
Asked
Active
Viewed 9,265 times
4 Answers
3
Find some inspiration in this post How can I search a word in a Word 2007 .docx file?

Community
- 1
- 1

Fredrik Pihl
- 44,604
- 7
- 83
- 130
2
You can use the python module docx2txt for extracting text as well as images from docx files

Ankush Shah
- 938
- 8
- 13
-2
document =docx.Document(filepath)
for image in document.inline_shapes:
print (image.width, image.height)
Try this it will work.

André Kool
- 4,880
- 12
- 34
- 44

Sathish Kumar MK
- 1
- 1