0

I need to split an XML file with 4 nodes into 4 files. Given a file like this:

sddskjdsfds
asdadsa
20:15:12 st: <aRoot>
   <aNode>v</aNode>
   <otherNode a="2"/>
</aRoot>
kjfsdfj
20:15:59 r: <otherRoot>
   <bNode>h</bNode>
</otherRoot>
sddskjdsfds
asdadsa
22:31:32 st: <aRoot>
   <aNode>a</aNode>
   <otherNode a="1"/>
</aRoot>
kjfsdfj
22:31:39 r: <otherRoot>
   <bNode>o</bNode>
</otherRoot>
other-random-lines

I need to split it into 4 files: *aRoot_1.xml, aRoot_2.xml, otherRoot_1.xml, otherRoot_2.xml.

So far I've achieved:

awk '/st:/,/<\/aRoot>/' file.txt > all_aRoots.txt

And similar aproach for <otherRoot>: another call to awk, writing to all_otherRoots.txt, etc.

But that keeps all the chars before </aroot>, and results in all the <aRoot> being in the outputfile.

How do I split an log file with 4 xml nodes into 4 files using Bash? UPDATE #1: Please mind the non-xml lines, they must be excluded, and, if possible, from the lines with XML starting with non-xml text must keep only the XML part

UPDATE #2: A sample output file from RavinderSingh13 answer:

sddskjdsfds
asdadsa
20:15:12 st: <aRoot>
   <aNode>v</aNode>
   <otherNode a="2"/>
Diego Shevek
  • 486
  • 3
  • 15
  • Use the right tool for the job. [How to parse XML in Bash?](https://stackoverflow.com/q/893585/608639), [How to parse XML using shellscript?](https://stackoverflow.com/q/4680143/608639), etc. – jww Sep 26 '19 at 02:47
  • 1
    @jww The input file IS NOT an XML file; is a log file with a lot of different XMLs besides other non-XML lines, as the example I gave – Diego Shevek Sep 26 '19 at 03:05

1 Answers1

0

Could you please try following.

awk -F"[><]" '
/^<\//{
  out_file=ind"_"array[ind]".xml"
  print val > (out_file)
  close(out_file)
  val=ind=""
}
/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/{
  ind=$(NF-1)
  array[$(NF-1)]++
}
{
  val=(val?val ORS:"")$0
}
'  Input_file


EDIT: Adding code to remove starting non-desirable lines by OP.

awk -F"[><]" '
/^<\//{
  out_file=ind"_"array[ind]".xml"
  flag=1
  if(val){
    print val > (out_file)
  }
  close(out_file)
  val=ind=""
}
/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/{
  ind=$(NF-1)
  array[$(NF-1)]++
}
flag{
  val=(val?val ORS:"")$0
}
'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thanks for your update, it didn't work (created just 3 files, 1 for aRoot, 2 for otherRoot); I'll change my approach and stop trying a one-liner solution – Diego Shevek Sep 26 '19 at 11:55