Working with xml and exporting names of nodes

Question

I wrote this code below. In my XML file I have nodes:

Assembly_1, Detail_1, Detail_2, Assembly_2, Detail_3

What I am trying to do is to get the name of the assembly for each detail (Detail_1 and 2 would be in Assembly_1, etc.)

I have a lot of details... more than 200. So this code (function) works good but it takes a lot of time because the XML file is loaded each time.

How can I make it run faster?

def CorrectAssembly(detail):

    from xml.dom import minidom

    xml_path = r"C:\Users\vblagoje\test_python_s2k\Load_Independent_Results\HSB53111-01-D_2008_v2-Final-Test-Cases_All_1.1.xml"
    mydoc=minidom.parse(xml_path)
    root = mydoc.getElementsByTagName("FEST2000")
    assembly=""

    for node in root:
        for childNodes in node.childNodes:
            if childNodes.nodeType == childNodes.TEXT_NODE: continue

            if childNodes.nodeName == "ASSEMBLY":
                assembly = childNodes.getAttribute("NAME")
            if childNodes.nodeName == "DETAIL":
                if detail == childNodes.getAttribute("NAME"):
                    break

    return assembly

score 0 · Answer 1 · answered Aug 31 '20 at 11:53

One solution is, to simply read the XML-file once before looking up all the details.
Something along this:

from xml.dom import minidom


def CorrectAssembly(detail, root):

    assembly=""

    for node in root:
        for childNodes in node.childNodes:
            if childNodes.nodeType == childNodes.TEXT_NODE: continue

            if childNodes.nodeName == "ASSEMBLY":
                assembly = childNodes.getAttribute("NAME")
            if childNodes.nodeName == "DETAIL":
                if detail == childNodes.getAttribute("NAME"):
                    break

    return assembly


xml_path = r"C:\Users\vblagoje\test_python_s2k\Load_Independent_Results\HSB53111-01-D_2008_v2-Final-Test-Cases_All_1.1.xml"
mydoc=minidom.parse(xml_path)
root = mydoc.getElementsByTagName("FEST2000")

aDetail = "myDetail"
assembly = CorrectAssembly(aDetail, root)
anotherDetail = "myDetail2"
assembly = CorrectAssembly(anotherDetail , root)
# an so on

You still go through (part of) the loaded XML every time you call the function though. Maybe it is beneficial to create a dictionary mapping the assembly to details and then to simply look them up when you need it:

from xml.dom import minidom

# read the xml
xml_path = r"C:\Users\vblagoje\test_python_s2k\Load_Independent_Results\HSB53111-01-D_2008_v2-Final-Test-Cases_All_1.1.xml"
mydoc=minidom.parse(xml_path)
root = mydoc.getElementsByTagName("FEST2000")

detail_assembly_map = {}

# fill the dictionary
for node in root:
    for childNodes in node.childNodes:
        if childNodes.nodeType == childNodes.TEXT_NODE: continue
        if childNodes.nodeName == "ASSEMBLY":
            assembly = childNodes.getAttribute("NAME")
        if childNodes.nodeName == "DETAIL":
            detail_assembly_map[childNodes.getAttribute("NAME")] = assembly

# use it
aDetail = "myDetail"
assembly = detail_assembly_map[aDetail]

From your post it is not really clear how the structure of the XML is, but in case the details are children of the assemblies, then the mapping could be done differently by iterating first through the assembly-knots and therein through its detail-children. Then you would not rely on a proper ordering of the elements.

This post could be helpful too, depending on the structure of your XML-tree.

Working with xml and exporting names of nodes

1 Answers1