0

I found this helpful post on how to extract the text from a DOCX file, and I wanted to make it into a little shell script. My attempt is as follows

#!/bin/sh

if [[ $# -eq 0 ]]; then
    echo "pass in a docx file to get the text within"
    exit 1
fi

text="$(unzip -p $1 word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g')"
echo $text

However, this does not print the result as expected.

Any suggestions?

natemcintosh
  • 730
  • 6
  • 16

1 Answers1

1

Thanks to shellcheck.net, I found that I needed to put quotes around the $1. The final script, as approved by shellcheck, is:

#!/bin/sh

if [ $# -eq 0 ]; then
    echo "pass in a docx file to get the text within"
    exit 1
fi

text=$(unzip -p "$1" word/document.xml | sed -e 's/<\/w:p>/\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g')
echo "$text"
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
natemcintosh
  • 730
  • 6
  • 16
  • Good job solving the issue. Both command and process substitution are subshells. They are there own separate environment. As such, you must quote within them just the same as you would quote any command on the command line. – David C. Rankin Jul 14 '21 at 01:28