I found this helpful post on how to extract the text from a DOCX file, and I wanted to make it into a little shell script. My attempt is as follows
#!/bin/sh
if [[ $# -eq 0 ]]; then
echo "pass in a docx file to get the text within"
exit 1
fi
text="$(unzip -p $1 word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g')"
echo $text
However, this does not print the result as expected.
Any suggestions?