11

I'm searching for xml files that have certain properties. For example, files that contain the following pattern:

<param-value>
  <name>Hosts</name>
  <description>some description</description>
  <value></value>
</param-value>

For such files, I'd like to parse the value of another tag, such as:

<param-value>
  <name>Roles</name>
  <description>some description</description>
  <value>asdf</value>
</param-value>

And print out the file name along with "asdf". What's the simplest way to accomplish this from the command line?

One approach I was thinking of was just using grep with the -l option to filter the matching files out, and then using xargs grep to extract the value of Roles. However, grep doesn't work well with multi-line regexes. I saw another question that showed it could be done with the -Pzo options, but didn't have any luck getting it to work in my case. Is there a simpler approach?

jonderry
  • 23,013
  • 32
  • 104
  • 171
  • Is there any particular reason you don't want to use a scripting language such as perl? – Tom Feb 08 '12 at 20:07
  • No, a perl solution would be great, preferably a compact one-liner, but I don't know the best way to go about writing it. – jonderry Feb 08 '12 at 20:28
  • It would be helpful to have a solution that runs with just the most basic tools though, the xmlstarlet, xpath, and perl's xpath module are not installed on the system on which I'm going to perform the search. – jonderry Feb 08 '12 at 23:13
  • 2
    Possible duplicate of [How to parse XML in Bash?](http://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash) – Ciro Santilli OurBigBook.com Oct 07 '15 at 10:56
  • Possible duplicate of [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – jww May 30 '19 at 07:51
  • The simplest for me is to use [Saxon](http://saxon.sourceforge.net/) from the command line. Here's an example of using [XPath on the command line](https://stackoverflow.com/questions/8997709/how-to-use-saxon-xpath-processor-w-o-coding-in-java/8999664#8999664). This, combined with a shell script, would do exactly what you're asking. – Daniel Haley Feb 08 '12 at 20:10
  • This looks like the most portable solution which is what I need. – Tony O'Hagan Jul 16 '14 at 03:12
  • I had hoped to work through your problem more carefully, but I have run out of time, sorry. Anyway - [perl](http://perl-xml.sourceforge.net/faq/) has some very good modules for reading xml. In particular, the following article, [perl and xml on the command line](http://www.xml.com/pub/a/2002/04/17/perl-xml.html), is probably of interest. – Tom Feb 08 '12 at 21:32
  • According to [the answer to this question](https://stackoverflow.com/questions/91791/grep-and-sed-equivalent-for-xml-command-line-processing), [XMLStarlet](http://xmlstar.sourceforge.net/) seems to be very good for this kind of thing. – Lars Kotthoff Feb 08 '12 at 20:12

4 Answers4

12

The following linux command uses XPath to access specified values within the XML file

for xml in `find . -name "*.xml"`
do  
echo $xml `xmllint --xpath "/param-value/value/text()" $xml`| awk 'NF>1'
done

Example output for matching XML files:

./test1.xml asdf
./test4.xml 1234
Mark O'Connor
  • 76,015
  • 10
  • 139
  • 185
  • Didn't knew xmllint could be used to parse xml. To me this is the best answer because it's always installed as it's a system dependency (at least on CentOS/Redhat/...) – th3penguinwhisperer Dec 19 '18 at 07:59
1

I worked out a couple of solutions using basic perl/awk functionality (basically a poor man's parsing of the tags). If you see any improvements using only basic perl/awk functionality, let me know. I avoided dealing with multiline regular expressions by setting a flag with I see a particular tag. Kind of clumsy but it works.

perl:

perl -ne '$h = 1 if m/Host/; $r = 1 if m/Role/; if ($h && m/<value>/) { $h = 0; print "hosts: ", $_ =~ /<value>(.*)</, "\n"}; if ($r && m/<value>/) { $r = 0; print "\nrole: ", $_ =~ /<value>(.*)</, "\n" }'

awk:

awk '/Host/ {h = 1} /Role/ {r = 1} h && /<value>/ {h = 0; match($0, "<value>(.*)<", a); print "hosts: " a[1]} r && /<value>/ {r = 0; match($0, "<value>(.*)<", a); print "\nrole: " a[1]}'
jonderry
  • 23,013
  • 32
  • 104
  • 171
1
$ xmlstarlet ed -u /param-value/name -v Roles -u /param-value/value -v asdf data.xml

<?xml version="1.0"?>
<param-value>
  <name>Roles</name>
  <description>some description</description>
  <value>asdf</value>
</param-value>
kev
  • 155,172
  • 47
  • 273
  • 272
0

I usually use Perl's XML::XSH2. You can process XML files interactively in it, or script it. The script would be something like (untested):

for my $file in { glob "*.xml" } {
    open $file ;
    my $param_value = //param-value[name="Hosts"] ;
    if $param_value echo $file $value/value ;
}
choroba
  • 231,213
  • 25
  • 204
  • 289