I've searched the documentation for python-docx
and other packages, as well as stack-overflow, but could not find how to remove all images from docx
files with python.
My exact use-case: I need to convert hundreds of word documents to "draft" format to be viewed by clients. Those drafts should be identical the original documents but all the images must be deleted / redacted from them.
Sorry for not including an example of things I tried, what I have tried is hours of research that didn't give any info. I found this question on how to extract images from word files, but that doesn't delete them from the actual document: Extract pictures from Word and Excel with Python
From there and other sources I've found out that docx
files could be read as simple zip files, I don't know if that means that it's possible to "re-zip" without the images without affecting the integrity of the docx
file (edit: simply deleting the images works, but prevents python-docx
from continuing to work with this file because of missing references to images), but thought this might be a path to a solution.
Any ideas?