-2

I'm having some troubles finding a way to get all matched values from string. I have XML string stored in a variable. From that variable I extract a string with grep. That works well for one match but since grep returns only first matched value it doesn't work exactly how I want it to.

XML="..."

VALUE=($(grep -oP "<tag>(.*)</tag>" <<<"${XML}" | cut -d ">" -f 2 | cut -d "<" -f 1))

Is there any better/smarter way to tackle this than to find value, replace it in existing XML string so it is not a match anymore and then run that in loop until no matches are found?

Short XML example:

<?xml version="1.0" encoding="UTF-8"?>
<xmlDoc>
  <docName>...</docName>
  <formats>
    <format>
      <name>a:1</name>
    </format>
    <format>
      <name>b:2</name>
    </format>
  </formats>
</xmlDoc>
pcenta
  • 3
  • 4
  • 1
    https://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex – hek2mgl Nov 23 '20 at 10:15
  • I know this isn't ideal solution but I have pretty basic XML that doesn't have any attributes. – pcenta Nov 23 '20 at 10:31
  • You should add a snippet of the expected XML. Also, you're not going to find the solution with `grep`. You need lookup groups and such, try `perl -pe 's///' $file` instead. – Bayou Nov 23 '20 at 10:36
  • 1
    I believe this is what you're looking for: https://stackoverflow.com/questions/26709071/linux-bash-xmllint-with-xpath – Dominique Nov 23 '20 at 10:58

1 Answers1

1

Split to multiple lines and run the grep command.

VALUE="$(sed 's#</tag>#</tag>\n#g' <<<"${XML}" | grep -oP "<tag>(.*)</tag>" | cut -d ">" -f 2 | cut -d "<" -f 1)"
etsuhisa
  • 1,698
  • 1
  • 5
  • 7
  • @Bayou : That's one of the reason why using _grep_ for extracting information from XML is usually a silly idea. But the OP seems to be aware of this kind of fallacy - at least I conclude this from his comment -, so for the concrete question he gives, I think the answer by etsuhisa is quite reasonable. – user1934428 Nov 23 '20 at 11:13
  • I've added short XML example of my document. It seems that I can't make it work with your solution (for 'name' tag). Could it be because of whitespaces? – pcenta Nov 23 '20 at 11:54
  • Sorry for the late reply. Have you changed the `tag` part to `name` as follows? VALUE="$(sed 's#`name`>#`name`>\n#g' <<<"${XML}" | grep -oP "<`name`>(.*)`name`>" | ... – etsuhisa Nov 29 '20 at 11:24