I need to split an XML file with 4 nodes into 4 files. Given a file like this:
sddskjdsfds
asdadsa
20:15:12 st: <aRoot>
<aNode>v</aNode>
<otherNode a="2"/>
</aRoot>
kjfsdfj
20:15:59 r: <otherRoot>
<bNode>h</bNode>
</otherRoot>
sddskjdsfds
asdadsa
22:31:32 st: <aRoot>
<aNode>a</aNode>
<otherNode a="1"/>
</aRoot>
kjfsdfj
22:31:39 r: <otherRoot>
<bNode>o</bNode>
</otherRoot>
other-random-lines
I need to split it into 4 files: *aRoot_1.xml
, aRoot_2.xml
, otherRoot_1.xml
, otherRoot_2.xml
.
So far I've achieved:
awk '/st:/,/<\/aRoot>/' file.txt > all_aRoots.txt
And similar aproach for <otherRoot>
: another call to awk
, writing to all_otherRoots.txt
, etc.
But that keeps all the chars before </aroot>
, and results in all the <aRoot>
being in the outputfile
.
How do I split an log file with 4 xml nodes into 4 files using Bash? UPDATE #1: Please mind the non-xml lines, they must be excluded, and, if possible, from the lines with XML starting with non-xml text must keep only the XML part
UPDATE #2: A sample output file from RavinderSingh13 answer:
sddskjdsfds
asdadsa
20:15:12 st: <aRoot>
<aNode>v</aNode>
<otherNode a="2"/>