1

I have this XML file -

<gp>
<mms>1110012</mms>
<tg>988</tg>
<mm>LongTime</mm>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>12</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>2</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

<gp>
<mms>2221100</mms>
<tg>989</tg>
<mm>LongVelocity</mm>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>772</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>900</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

Now, I need to first search for "LongTime", if found then I have to find for "Desti = Motion" value (which is inside StartEle=ONE, Desti = Motion) inside the multiple nested sub-tags ... and if that is also found then I finally have to get the value inside the TAG below, which is 12 (12).

Please help, using anything - AWK, SED, Grep, anything will do.

Thanks in advance.

  • 1
    Try looking at this answer: http://stackoverflow.com/questions/4680143/how-to-parse-xml-using-shellscript – mikea Jan 09 '14 at 12:06
  • 1
    When I parse XML stream, I prefer to use tools which are optimized to do it. There are many shell languages and command which support DOM approach, Xpath queries, .... like Perl (which is provided by a majority of Linux distribution), Python, PHP (there is a PHP interpretor which allow us to write some shell scripts in PHP), xmllint, etc etc – Idriss Neumann Jan 09 '14 at 12:40

3 Answers3

2

Using awk

awk -F"[<>]" '/LongTime/ {f=1} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

This search for LongTime if found set flag f=1
If flag f is true and Desti = Motion is found, get next line and print value and reset flag f


To make sure it does not print other Desti = Motion if section LongTime does not contain Desti = Motion, you could reset the flag f if new section is not LongTime by adding /^<mm>/ && !/LongTime/ {f=0}:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

To avoid using getline incase of extra blank lines use this:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {q=1} f && q && /<kk>/ {print $3;f=q=0}' file
12

Just add an extra test.

Here is some more readable:

awk -F"[<>]" '
    /LongTime/              {f=1}
    /^<mm>/ && !/LongTime/  {f=0}
    f && /Desti = Motion/   {q=1} 
    f && q && /<kk>/        {print $3;f=q=0}
    ' file
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • Thanks, Jotne.But what if there is a new line between the upper and below tags, for example - StartEle=ONE, Desti = Motion ---- new line here --- 12 Well, I can add 2 "getline;" to resolve it as I know a new line is there, But this has to be dynamic as some other such tag might have interim spaces/ new lines, which is not know in advance. Please provide your inputs. – user3177377 Jan 09 '14 at 12:29
0
sed -n '\|<mm>LongTime</mm>|,\|</gp>| {
   \|Desti = Motion</lkid>|,\|</kk>| {
      /<kk>/ s|</\{0,1\}[^>]*>||gp
      }
   }' YourFile

this work on your sample XML but if it change (in format), specify wich kind of change you expect (case of new line is OK here) [use -posix for GNU sed]

NeronLeVelu
  • 9,908
  • 1
  • 23
  • 43
0

In Gnu Awk version 4, you could try something like:

gawk -f a.awk file.xml

where a.awk is:

BEGIN {
    RS="^$"
    FPAT="(<mm>LongTime</mm>)|(<lkid>[^<]*</lkid>)|(<kk>[^<]*</kk>)"
}
{
    do {
        if ($(++i)=="<mm>LongTime</mm>") {
            do {
                if ($(++i)~/<lkid>.*Desti = Motion.*<\/lkid>/) {
                    match ($(i+1),/<kk>([^<]*)<\/kk>/,a)
                    print a[1]
                    exit
                }
            } while (i<=NF)
        }
    } while (i<=NF)
}
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174