6

I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.

4 Answers4

4

Docx files can be unzipped for extracting the images.

Kevin C.
  • 2,499
  • 1
  • 27
  • 41
3

Find some inspiration in this post How can I search a word in a Word 2007 .docx file?

Community
  • 1
  • 1
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
2

You can use the python module docx2txt for extracting text as well as images from docx files

Ankush Shah
  • 938
  • 8
  • 13
-2
document =docx.Document(filepath)
for image in document.inline_shapes:
    print (image.width, image.height)

Try this it will work.

André Kool
  • 4,880
  • 12
  • 34
  • 44