1

See XML example below.

Using a bash script, how could I extract the Strings between all the "from" tags in the XML file into say, an array?
i.e. Something like array=[Ben, Jani, James, Harry, ...]

Example XML file:

<note>
<to>Tove</to>
<from>Ben</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>James</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Sean
  • 11
  • 1

3 Answers3

0

EDIT: @jil has informed me that my initial solution only works if the XML file is formatted as OP pasted, but fails if there are multiple <from> tags per line. The following code fixes that issue, as it removes all new line characters first:

#!/bin/bash

NAMES=()
one_line=$(sed "s/\n//g" file.xml)
from_names=$(echo $one_line | grep -Po "<from>(.*?)<\/from>")

for word in $from_names
do
    name=$(echo $word | sed -n "s/<from>\(.*\)<\/from>/\1/p")
    NAMES+=($name)
done

echo ${NAMES[@]}

Then you can reference each name like ${NAMES[0]}, ${NAMES[1]}, ${NAMES[2]}, etc.

echo ${NAMES[@]} at the end of the script prints out all elements of the list and is great for testing.

drewyupdrew
  • 1,549
  • 1
  • 11
  • 16
0

this is not an xml aware command, expects the tags are on each separate lines.

$ arr=$(sed -rn 's_<from>(.*)</from>_\1_p' xml)
$ echo ${arr[@]}
Ben Jani James
karakfa
  • 66,216
  • 7
  • 41
  • 56
-1

You want to use some XML shell tool such as xmlstarlet or xmllint or xpath (from XML::XPath Perl module).

E.g. using xpath:

array=( $(xpath -q -e "//from/text()" input_file.xml) )

using xmllint and sed:

array=( $(xmllint --xpath '//from' input_file.xml \
          | sed 's#</\?from># #g') )

P.S. Your sample input is not well-formed (it is missing the root-element)

jil
  • 2,601
  • 12
  • 14