Should I parse this XML with BASH?

Question

I'm trying to get from this xml example

<String Name="descResist">
    <Description><![CDATA["resist_type_chimney"]]></Description>
    <Flags>
        <ParFlg_Child/>
    </Flags>
    <Value><![CDATA["90_min."]]></Value>
</String>

this

descResist;resist_type_chimney 
descResist;90_min.

So, basically I need to extract the CDATA content and concat it with the value of Name.

One of problems is, that it isn't always in tag String... could be also Integer, Title, Boolean, etc...

I tried this

$ grep -o "Name=\".*\"\|<\!\[CDATA\[.*\]\]>" file.xml | sed 's/<\!\[CDATA\[\"$.* $\"\]\]>/\1/'

which gives me

Name="descResist"
resist_type_chimney
90_min.

How can I prefix the next lines with value of Name string?

Like in

Name="descResist"
resist_type_chimney
90_min.
Name="anotherName"
foo_bar
Name="anoooother"
Name="notempty"
bar_foo

it gets a little complicated.

It's also good to work with XML like this? There also should be any nested <tagType Name=... so I guess this shouldn't be problem.

EDIT: I'm working on cygwin a looking for bash/sed/awk simple solution.

Check http://stackoverflow.com/questions/4680143/how-to-parse-xml-using-shellscript — anishsane, Jul 02 '13 at 11:34

score 2 · Answer 1 · answered Jul 02 '13 at 11:45

2

I suggest to use a xml parser. Here you have an example of perl using XML::Twig.

Content of script.pl:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new(
        twig_handlers => {
                '//*[@Name]' => sub {
                        for my $d ( $_->descendants( '#CDATA' ) ) { 
                                (my $t = $d->text) =~ s/\A"(.*)"\z/$1/; 
                                printf qq|%s;%s\n|, $_->att( 'Name' ), $t; 
                        }   
                },  
        }   
)->parsefile( shift );

Run it like:

perl script.pl xmlfile

That yields:

descResist;resist_type_chimney
descResist;90_min.

answered Jul 02 '13 at 11:45

Birei

35,723
2
77
82

I can't have perl on cygwin and really looking for bash/sed/awk solution. But thx – bartimar Jul 02 '13 at 12:18
have a look at http://perldoc.perl.org/perlcygwin.html . Not straight forward, but may help. – anishsane Jul 02 '13 at 12:23
I can't install anything, that is the problem :) – bartimar Jul 02 '13 at 13:09

Radu C · Accepted Answer · 2013-07-02T23:55:32.947

Try this out:

#!/bin/bash

Name="InvalidName"
while read line; do
        case "$line" in
                Name=*) eval "$line" ;; # assuming $line is always bash-friendly Name="Value"
                *) echo "$Name;$line" ;;
        esac
done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*)"\]\]>/\1/')

I've changed your command slightly to use extended regular expressions (that's why it's "egrep" and "sed -r") so it's a bit easier to read.

I don't like that eval I've used, but "export -n" does something strange for this case, and the code would get needlessly complex just to avoid the eval.

It's OK to "parse" XML in Bash if you're really really sure the text structure will not change. As soon as somebody decides to "optimize" the XML by collapsing it all into a single line, you're a bit toast.

EDIT

Here's a script without the ugly eval:

#!/bin/bash

Name="InvalidName"
while read line; do
        case "$line" in
                Name=*) export -n "$line" ;; # assuming $line is always bash-friendly Name=Value
                *) echo "$Name;$line" ;;
        esac
done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*?)"\]\]>/\1/; s/Name="(.*)"/Name=\1/')

Nice, this looks way more better than my solution. But I don't really like that eval (same as you). — bartimar, Jul 02 '13 at 14:15

Should I parse this XML with BASH?

2 Answers2