Using shell command need to parse multiple nested tag value of a XML file

Question

I have this XML file -

<gp>
<mms>1110012</mms>
<tg>988</tg>
<mm>LongTime</mm>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>12</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>2</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

<gp>
<mms>2221100</mms>
<tg>989</tg>
<mm>LongVelocity</mm>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>772</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>900</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

Now, I need to first search for "LongTime", if found then I have to find for "Desti = Motion" value (which is inside StartEle=ONE, Desti = Motion) inside the multiple nested sub-tags ... and if that is also found then I finally have to get the value inside the TAG below, which is 12 (12).

Please help, using anything - AWK, SED, Grep, anything will do.

Thanks in advance.

Try looking at this answer: http://stackoverflow.com/questions/4680143/how-to-parse-xml-using-shellscript — mikea, Jan 09 '14 at 12:06
When I parse XML stream, I prefer to use tools which are optimized to do it. There are many shell languages and command which support DOM approach, Xpath queries, .... like Perl (which is provided by a majority of Linux distribution), Python, PHP (there is a PHP interpretor which allow us to write some shell scripts in PHP), xmllint, etc etc — Idriss Neumann, Jan 09 '14 at 12:40

Jotne · Accepted Answer · 2014-01-09T12:42:42.660

Using awk

awk -F"[<>]" '/LongTime/ {f=1} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

This search for LongTime if found set flag f=1
If flag f is true and Desti = Motion is found, get next line and print value and reset flag f

To make sure it does not print other Desti = Motion if section LongTime does not contain Desti = Motion, you could reset the flag f if new section is not LongTime by adding /^<mm>/ && !/LongTime/ {f=0}:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

To avoid using getline incase of extra blank lines use this:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {q=1} f && q && /<kk>/ {print $3;f=q=0}' file
12

Just add an extra test.

Here is some more readable:

awk -F"[<>]" '
    /LongTime/              {f=1}
    /^<mm>/ && !/LongTime/  {f=0}
    f && /Desti = Motion/   {q=1} 
    f && q && /<kk>/        {print $3;f=q=0}
    ' file

Thanks, Jotne.But what if there is a new line between the upper and below tags, for example - StartEle=ONE, Desti = Motion ---- new line here --- 12 Well, I can add 2 "getline;" to resolve it as I know a new line is there, But this has to be dynamic as some other such tag might have interim spaces/ new lines, which is not know in advance. Please provide your inputs. — user3177377, Jan 09 '14 at 12:29

score 0 · Answer 2 · answered Jan 09 '14 at 12:39

sed -n '\|<mm>LongTime</mm>|,\|</gp>| {
   \|Desti = Motion</lkid>|,\|</kk>| {
      /<kk>/ s|</\{0,1\}[^>]*>||gp
      }
   }' YourFile

this work on your sample XML but if it change (in format), specify wich kind of change you expect (case of new line is OK here) [use -posix for GNU sed]

score 0 · Answer 3 · answered Jan 09 '14 at 13:14

In Gnu Awk version 4, you could try something like:

gawk -f a.awk file.xml

where a.awk is:

BEGIN {
    RS="^$"
    FPAT="(<mm>LongTime</mm>)|(<lkid>[^<]*</lkid>)|(<kk>[^<]*</kk>)"
}
{
    do {
        if ($(++i)=="<mm>LongTime</mm>") {
            do {
                if ($(++i)~/<lkid>.*Desti = Motion.*<\/lkid>/) {
                    match ($(i+1),/<kk>([^<]*)<\/kk>/,a)
                    print a[1]
                    exit
                }
            } while (i<=NF)
        }
    } while (i<=NF)
}

Using shell command need to parse multiple nested tag value of a XML file

3 Answers3