0

I am trying to make a script named letter.sh to unzip a word-file, extract the text and images, and save them in a directory. I think I managed to unzip and extract the images, but I am struggling with extracting the text from the document.xml file.

The sentences I want to extract are in this formatet:

<w:t>text</w:t>

I have tried using grep, but it doesn't work.

grep "<w:t>*</w:t>" ~word/document.xml < touch letter.txt

I would appreciate it if anyone could guide me onto the right path.

Thank you.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
robe320
  • 1
  • 2
  • grep isn't a good choice to parse xml, but in your attempt, you were missing `.` before `*` (also, use single quotes instead of double quotes - see https://mywiki.wooledge.org/Quotes) – Sundeep Oct 30 '20 at 12:51
  • 1
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Oct 30 '20 at 13:12

0 Answers0