0

I'm trying to get from this xml example

<String Name="descResist">
    <Description><![CDATA["resist_type_chimney"]]></Description>
    <Flags>
        <ParFlg_Child/>
    </Flags>
    <Value><![CDATA["90_min."]]></Value>
</String>

this

descResist;resist_type_chimney 
descResist;90_min.

So, basically I need to extract the CDATA content and concat it with the value of Name.

One of problems is, that it isn't always in tag String... could be also Integer, Title, Boolean, etc...

I tried this

$ grep -o "Name=\".*\"\|<\!\[CDATA\[.*\]\]>" file.xml | sed 's/<\!\[CDATA\[\"\(.* \)\"\]\]>/\1/'

which gives me

Name="descResist"
resist_type_chimney
90_min.

How can I prefix the next lines with value of Name string?

Like in

Name="descResist"
resist_type_chimney
90_min.
Name="anotherName"
foo_bar
Name="anoooother"
Name="notempty"
bar_foo

it gets a little complicated.

It's also good to work with XML like this? There also should be any nested <tagType Name=... so I guess this shouldn't be problem.

EDIT: I'm working on cygwin a looking for bash/sed/awk simple solution.

bartimar
  • 3,374
  • 3
  • 30
  • 51

2 Answers2

2

I suggest to use a parser. Here you have an example of using XML::Twig.

Content of script.pl:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new(
        twig_handlers => {
                '//*[@Name]' => sub {
                        for my $d ( $_->descendants( '#CDATA' ) ) { 
                                (my $t = $d->text) =~ s/\A"(.*)"\z/$1/; 
                                printf qq|%s;%s\n|, $_->att( 'Name' ), $t; 
                        }   
                },  
        }   
)->parsefile( shift );

Run it like:

perl script.pl xmlfile

That yields:

descResist;resist_type_chimney
descResist;90_min.
Birei
  • 35,723
  • 2
  • 77
  • 82
2

Try this out:

#!/bin/bash

Name="InvalidName"
while read line; do
        case "$line" in
                Name=*) eval "$line" ;; # assuming $line is always bash-friendly Name="Value"
                *) echo "$Name;$line" ;;
        esac
done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*)"\]\]>/\1/')

I've changed your command slightly to use extended regular expressions (that's why it's "egrep" and "sed -r") so it's a bit easier to read.

I don't like that eval I've used, but "export -n" does something strange for this case, and the code would get needlessly complex just to avoid the eval.

It's OK to "parse" XML in Bash if you're really really sure the text structure will not change. As soon as somebody decides to "optimize" the XML by collapsing it all into a single line, you're a bit toast.

EDIT

Here's a script without the ugly eval:

#!/bin/bash

Name="InvalidName"
while read line; do
        case "$line" in
                Name=*) export -n "$line" ;; # assuming $line is always bash-friendly Name=Value
                *) echo "$Name;$line" ;;
        esac
done < <(egrep -o 'Name=".*"|<!\[CDATA\[.*?\]\]>' file.xml | sed -r 's/<!\[CDATA\["(.*?)"\]\]>/\1/; s/Name="(.*)"/Name=\1/')
Radu C
  • 1,340
  • 13
  • 31
  • Nice, this looks way more better than my solution. But I don't really like that eval (same as you). – bartimar Jul 02 '13 at 14:15