Python : How to combine data from multiple XML files

Question

I am using Element Tree to parse XML files. I have multiple XML files. The elements of the XML files are identified by a unique key (SKU), but other tags are different. I want to combine the tags corresponding to each element in another file. To do this I can start parsing each child element of the first XML and loop through the child elements of other files to find the element with the give sku:

tree = ET.parse(filename)
root = tree.getroot()
tree1 = ET.parse(filename1)
root1 = tree1.getroot()
...#more xmls
for child in root:
    sku = child.find('SKU').text
    for child1 in root1:
        sku1 = child1.find('SKU').text
        if sku == sku1:
            #do something

But I realize that this method is not very efficient. Is there a better way of doing this?

Thanks

EDIT: Eg. the 1st xml has elements of the following form:

<product>
    <SKU>ABCD1234</SKU>
    <_Image>something</_Image>
    <_Image_Count>2</_Image_Count>
    <_Image2>something</_Image2>
    <_Image3>something</_Image3>
    <_Orignal_Image>something</_Orignal_Image>
</product>

and 2nd XML has the elements of the following form:

<product>
    <Product_Code>ABCD1234</Product_Code>
    <Designer>xxx</Designer>
    <Taxon>yyy</Taxon>
    <Parent_Taxon>zzz</Parent_Taxon>
    <Taxonomy>aaa</Taxonomy>
    <Quantity>1</Quantity>
    <Cost>2</Cost>
    <MRP>3</MRP>
    <Price>4</Price>
</product>

I want to combine the 2 XMLs to get:

<product>
    <SKU>ABCD1234</SKU>
    <_Image>something</_Image>
    <_Image_Count>2</_Image_Count>
    <_Image2>something</_Image2>
    <_Image3>something</_Image3>
    <_Orignal_Image>something</_Orignal_Image>
    <Product_Code>ABCD1234</Product_Code>
    <Designer>xxx</Designer>
    <Taxon>yyy</Taxon>
    <Parent_Taxon>zzz</Parent_Taxon>
    <Taxonomy>aaa</Taxonomy>
    <Quantity>1</Quantity>
    <Cost>2</Cost>
    <MRP>3</MRP>
    <Price>4</Price>
</product>

I am embarrassed to say I am not following - it might be helpful to add a brief example of the data and the output. — PyNEwbie, Mar 08 '14 at 19:34

score 1 · Answer 1 · answered Mar 08 '14 at 19:49

Write a class to manage each type of xml file. It should have a method that takes a list of SKUs and returns a collection of thingies with the properties you are interested in from it.

And another that takes that collection, and modifies the xml it owns using it.

elementTree has limited xml support but looking at your example files the findall method would be a good start to get a a collection of 'sku' nodes.

Don't try and do it all in one go, and opening up every file and using nested loops is definitely not the way to go.

score 1 · Answer 2 · edited May 23 '17 at 11:49

I would do this differently. There are a couple of recipes for converting a Python dictionary to XML.

Read in each file and convert it into a dictionary of dictionaries where the outer key is the SKU and the inner dictionary are all of the other elements.
create a master dictionary which combines the dictionaries from each file (see this about combining dictionaries )
Create an xml file with the result

If you need to preserve order then use Ordered dictionaries.

And as I am writing this I think it might be easier to go from dictionary to json to xml -

score 1 · Answer 3 · answered Mar 08 '14 at 20:41

I would suggest looking at the BeautifulSoup library.

Wrote a small sample snippet for creating the combined XML.

from bs4 import BeautifulSoup

first = BeautifulSoup(open("first.xml"), "lxml")

first_as_dict = dict([(x.text, x.parent()) for x in first.find_all("sku")])

second = BeautifulSoup(open("second.xml"), "lxml")
# The actual tag name in your sample XML is "product_code",
# its not "SKU" as in the first one, change this if that is not correct
second_as_dict = dict([(x.text, x.parent()) for x in second.find_all("product_code")])

combined = BeautifulSoup("", "lxml")

for key, value in first_as_dict.iteritems():
    product_tag = combined.new_tag("product")
    items = value + second_as_dict[key]
    for item in items:
        product_tag.append(item)
    combined.append(product_tag)

print(combined.prettify())

Python : How to combine data from multiple XML files

3 Answers3