1

I am using Element Tree to parse XML files. I have multiple XML files. The elements of the XML files are identified by a unique key (SKU), but other tags are different. I want to combine the tags corresponding to each element in another file. To do this I can start parsing each child element of the first XML and loop through the child elements of other files to find the element with the give sku:

tree = ET.parse(filename)
root = tree.getroot()
tree1 = ET.parse(filename1)
root1 = tree1.getroot()
...#more xmls
for child in root:
    sku = child.find('SKU').text
    for child1 in root1:
        sku1 = child1.find('SKU').text
        if sku == sku1:
            #do something

But I realize that this method is not very efficient. Is there a better way of doing this?

Thanks

EDIT: Eg. the 1st xml has elements of the following form:

<product>
    <SKU>ABCD1234</SKU>
    <_Image>something</_Image>
    <_Image_Count>2</_Image_Count>
    <_Image2>something</_Image2>
    <_Image3>something</_Image3>
    <_Orignal_Image>something</_Orignal_Image>
</product>

and 2nd XML has the elements of the following form:

<product>
    <Product_Code>ABCD1234</Product_Code>
    <Designer>xxx</Designer>
    <Taxon>yyy</Taxon>
    <Parent_Taxon>zzz</Parent_Taxon>
    <Taxonomy>aaa</Taxonomy>
    <Quantity>1</Quantity>
    <Cost>2</Cost>
    <MRP>3</MRP>
    <Price>4</Price>
</product>

I want to combine the 2 XMLs to get:

<product>
    <SKU>ABCD1234</SKU>
    <_Image>something</_Image>
    <_Image_Count>2</_Image_Count>
    <_Image2>something</_Image2>
    <_Image3>something</_Image3>
    <_Orignal_Image>something</_Orignal_Image>
    <Product_Code>ABCD1234</Product_Code>
    <Designer>xxx</Designer>
    <Taxon>yyy</Taxon>
    <Parent_Taxon>zzz</Parent_Taxon>
    <Taxonomy>aaa</Taxonomy>
    <Quantity>1</Quantity>
    <Cost>2</Cost>
    <MRP>3</MRP>
    <Price>4</Price>
</product>
nish
  • 6,952
  • 18
  • 74
  • 128

3 Answers3

1

Write a class to manage each type of xml file. It should have a method that takes a list of SKUs and returns a collection of thingies with the properties you are interested in from it.

And another that takes that collection, and modifies the xml it owns using it.

elementTree has limited xml support but looking at your example files the findall method would be a good start to get a a collection of 'sku' nodes.

Don't try and do it all in one go, and opening up every file and using nested loops is definitely not the way to go.

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
1

I would do this differently. There are a couple of recipes for converting a Python dictionary to XML.

  1. Read in each file and convert it into a dictionary of dictionaries where the outer key is the SKU and the inner dictionary are all of the other elements.
  2. create a master dictionary which combines the dictionaries from each file (see this about combining dictionaries )
  3. Create an xml file with the result

If you need to preserve order then use Ordered dictionaries.

And as I am writing this I think it might be easier to go from dictionary to json to xml -

Community
  • 1
  • 1
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
1

I would suggest looking at the BeautifulSoup library.

Wrote a small sample snippet for creating the combined XML.

from bs4 import BeautifulSoup

first = BeautifulSoup(open("first.xml"), "lxml")

first_as_dict = dict([(x.text, x.parent()) for x in first.find_all("sku")])

second = BeautifulSoup(open("second.xml"), "lxml")
# The actual tag name in your sample XML is "product_code",
# its not "SKU" as in the first one, change this if that is not correct
second_as_dict = dict([(x.text, x.parent()) for x in second.find_all("product_code")])

combined = BeautifulSoup("", "lxml")

for key, value in first_as_dict.iteritems():
    product_tag = combined.new_tag("product")
    items = value + second_as_dict[key]
    for item in items:
        product_tag.append(item)
    combined.append(product_tag)

print(combined.prettify())
asp
  • 621
  • 8
  • 18