docx to txt:
I tried the following code for extracting text from docx. It does not work when docx has images.
unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'
For pptx to txt, I found a Perl script to extract txt. It does not work when the pptx has images - the same.
I want extracted txt content for enabling search option among documents. So a command/script that will skip the images and convert the docx text content to txt will even help!