I have 100's of xml files in a directory. The structure of the xml is exactly the same. However, I want to add some of the nodes of the xml together and retain the rest as it is.
Example xml 1
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>imglab dataset</name>
<comment>Created by imglab tool.</comment>
<images>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0420.jpg'></image>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0544.jpg'>
<box top='343' left='72' width='92' height='29'>
<label>LBS_Marks
</label></box></image>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0489.jpg'></image>
</images>
</dataset>
Example xml 2
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>imglab dataset</name>
<comment>Created by imglab tool.</comment>
<images>
<image file="/home/orcl/user102339/Area123/Geo_Tag_0812-0420.jpg">
<box top="505" left="326" width="59" height="32">
<label>SBS_Marks</label>
</box>
</image>
<image file="/home/orcl/user102339/Area123/Geo_Tag_0812-0544.jpg">
<box top="507" left="331" width="50" height="27">
<label>SBS_Marks</label>
</box>
</image>
<image file="/home/orcl/user102339/Area123/Geo_Tag_0812-0489.jpg">
<box top="509" left="330" width="51" height="25">
<label>SBS_Marks</label>
</box>
</image>
</images>
</dataset>
In both these data sets, the images are the same however the markings are different. For example, in the first example set, the first image 0420.jpg does not have any box tags associated with it, while the same image in the second file has box tag with label SBS_Marks associated with. I am trying to merge these files together, so that for each image, I get only the box coordinates and label. For example the desired output will be as follows:
<?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?>
<dataset>
<name>imglab dataset</name>
<comment>Created by imglab tool.</comment>
<images>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0420.jpg'>
<box top="505" left="326" width="59" height="32">
<label>SBS_Marks</label>
</box>
</image>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0544.jpg'>
<box top='343' left='72' width='92' height='29'>
<label>LBS_Marks
</label></box>
<box top="507" left="331" width="50" height="27">
<label>SBS_Marks</label>
</box>
</image>
<image file='/home/orcl/user102339/Area123/Geo_Tag_0812-0489.jpg'>
<box top="509" left="330" width="51" height="25">
<label>SBS_Marks</label>
</box>
</image>
</images>
</dataset>
In the desired output example, the first image 0420.jpg has the box and label elements from second file, second image 0544.jpg has two boxes and labels one each from file 1 and file 2 and third image has the box and label from the second file.
I tried using this code:
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
def run(files):
first = None
for filename in files:
data = ElementTree.parse(filename).getroot()
if first is None:
first = data
else:
first.extend(data)
if first is not None:
print ElementTree.tostring(first)
if __name__ == "__main__":
run(sys.argv[1:])
But this just prints the contents of the file one after the other but does not merge. I don't know how to create an xsl template, hence could not try with it. Can someone help with a better code for the above or provide an xsl template that helps me in merging all these files in the folder please.