I would like to convert all .docx files in a directory (and subdirectories) to text files from the command line (so I can use grep after on these files). I found this
unzip -p tutu.docx word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g'
here which works well but it sends the file in the terminal. I would like to write the new text file (.txt for instance) in the same directory as the .docx file. And I would like a script to do this recursively.
I have this, using antiword, that do what I want for .doc files but it doesn't work for .docx files.
find . -name '*.doc' | while read i; do antiword -i 1 "${i}" >"${i/doc/txt}"; done
I tried to mix both but without success... A command line that would do both at the same time would be appreciated!
Thank you