One XML message from the big XML file

Question

I have a header XML node like <Fund and Footer node which is </Fund>, so I wrote something like this to retrieve the message associated with this ID Every XML has a id "33969871" (refer script below)

Provided I give the ID and run this (bash) it should find the ID and traverse back to the top of the message(i,e - <Fund and then to the bottom of the message (i.e </Fund>) and the output should that XML

Input file

<Fund LastUpdate="2017-05-23T10:32:53.563000000">   
<ID>13779321</ID>    
</Fund>    
<Fund LastUpdate="2017-05-23T10:32:53.563000000">    
<ID>13779322</ID>    
</Fund>    
<Fund LastUpdate="2017-05-23T10:32:53.563000000">    
<ID>13779323</ID>    
</Fund>

My awk command

/usr/xpg4/bin/awk '/\<Fund/{flag=1;found=j=0; delete a}
  flag{a[++j]=$0}                            /'33969781'/ && flag{found=1}        
       /\<\/Fund>/{flag=0                      # Ending pattern & found show our array
               if(found){for (i=1;i<=j;i++){
                          print a[i]}}}' ABC_866.xml

But I do not get the results.

[Don't use regex to parse context sensitive languages](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Use an XML parser instead, like xmlstarlet. — Aserre, Jan 09 '18 at 14:09
Your XML tags were missing in this question, because you didn't use the preview window prior to submitting it. Please always preview questions and ensure they are actually readable before publishing - this will save volunteers from needing to repair your question. — halfer, Jan 12 '18 at 21:58

score 1 · Answer 1 · answered Jan 09 '18 at 15:24

You could use xpath

xpath -q -e '//Fund/ID[text()='13779321']/..' test.xml

prints

<Fund LastUpdate="2017-05-23T10:32:53.563000000">   
  <ID>13779321</ID>    
</Fund>

for

<root>
  <Fund LastUpdate="2017-05-23T10:32:53.563000000">   
   <ID>13779321</ID>    
  </Fund>    
  <Fund LastUpdate="2017-05-23T10:32:53.563000000">    
    <ID>13779322</ID>    
   </Fund>    
  <Fund LastUpdate="2017-05-23T10:32:53.563000000">    
    <ID>13779323</ID>    
  </Fund>  
</root>

score 0 · Answer 2 · answered Jan 09 '18 at 14:48

You can do it with a single grep statement:

ABC_866.xml:

<Fund LastUpdate="2017-05-23T10:32:53.563000000">   
<ID>13779321</ID>    
</Fund>    
<Fund LastUpdate="2017-05-23T10:32:53.563000000">    
<ID>13779322</ID>    
</Fund>    
<Fund LastUpdate="2017-05-23T10:32:53.563000000">    
<ID>13779323</ID>    
</Fund>

Grep command and output:

# grep -B 1 -A 1 13779322 ABC_866.xml
<Fund LastUpdate="2017-05-23T10:32:53.563000000">
<ID>13779322</ID>
</Fund>

Explaining command:

-B : lines before matching line

-A : lines after matching line

score 0 · Answer 3 · answered Jan 09 '18 at 17:54

0

with gawk's multi-char RS support and assuming the formatting of the files is as shown.

$ awk -v RS='</Fund>' '/13779321/{print $0 RT}' file

<Fund LastUpdate="2017-05-23T10:32:53.563000000">
<ID>13779321</ID>
</Fund>

answered Jan 09 '18 at 17:54

karakfa

66,216
7
41
56

One XML message from the big XML file

3 Answers3