3

For a given xml file called configurations.xml I would like to extract the value of each conf element, and store it in a variable for later use.

<configurations>
  <conf name="bob"/>
  <conf name="alice"/>
  <conf name="ted"/>
  <conf name="carol"/>
</configurations>

The expected output is:

bob
ailce
ted
carol

I have xpath and xmllint available. A xpath of //conf/@name gets the nodes, but outputs as name="bob", which is what I'm trying to avoid.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Claus Jørgensen
  • 25,882
  • 9
  • 87
  • 150

5 Answers5

3
xmlstarlet sel -t -m '//configurations/conf' -v '@name' -n a.xml

worked since xmllint does not seem capable. Good intro here.

Tested on: xmlstarlet version 1.5.0, Ubuntu 14.04.

It fails however on large files: ulimit -Sv 500000 (limit it to 500Mb) dies on a 1.2Gb XML, and jams my computer without the memory limit. See also:

Community
  • 1
  • 1
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
1

I don't know how to achieve what you're trying to achieve with xmllint only.

Since you have xpath installed, you have Perl's XML::XPath too. So a little bit of Perl:

#!/usr/bin/perl

use XML::Path;

my $xp=XML::XPath->new(filename => 'configurations.xml');

my $nodeset=$xp->find('//conf/@name');
foreach my $node ($nodeset->get_nodelist) {
    print $node->getNodeValue,"\0";
}

will output what you want, separated with a nil character.

In a one-liner style:

perl -mXML::XPath -e 'foreach $n (XML::XPath->new(filename => "configurations.xml")->find("//conf/\@name")->get_nodelist) { print $n->getNodeValue,"\0"; }'

To retrieve them in, e.g., a Bash array:

#!/bin/bash

names=()
while IFS= read -r -d '' n; do
    names+=( "$n" )
done < <(
    perl -mXML::XPath -e 'foreach $n (XML::XPath->new(filename => "configurations.xml")->find("//conf/\@name")->get_nodelist) { print $n->getNodeValue,"\0" }'
)
# See what's in your array:
display -p names

Note that at this point you have the option of turning to Perl and drop Bash completely to solve your problem.

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
0

I searched everywhere for this seemingly simple answer. It appears that it is not possible for xmllint to print attribute values from multiple nodes. You can use string(//conf/@name), but that will only print a single value even if there are multiple nodes that match.

If you are stuck with xmllint, the only way is to use additional text processing. Here's a generic way that will parse out the attribute value. It assumes the values do not contain = or " characters.

xmllint --xpath //conf/@name | 
tr ' ' '\n' | awk -F= '{print $2}' | sed 's/"//g'

The first pipe converts spaces to newlines.

The second pipe prints what's after the =

The last pipe removes all "

wisbucky
  • 33,218
  • 10
  • 150
  • 101
-1

You can use awk command to make it done.

[root@myserver tmp]# cat /tmp/test.xml
<configurations>
  <conf name="bob"/>
  <conf name="alice"/>
  <conf name="ted"/>
  <conf name="carol"/>
</configurations>
[root@myserver tmp]# awk -F \" '{print $2}' /tmp/test.xml |grep -v '^$'
bob
alice
ted
carol
[root@myserver tmp]#
Sriharsha Kalluru
  • 1,743
  • 3
  • 21
  • 27
-1

If you really want use xpath and to display only the attribute values without the "name=" part, then here's what worked for me:

xpath configurations.xml 'string(//conf/@name)' 2>/dev/null

In plain English, wrap your XPath query in string(), and also suppress the verbose ouput of xpath by adding 2>/dev/null at the end.

Alin Pandichi
  • 955
  • 5
  • 15