0

My question is not really a problem because the program works the way it is right now, however I'm looking for a way to improve the maintainability of it since the system is growing quite fast.

In essence, I have a function (let's call it 'a') that is going to process a XML (in a form of a python dict) and it is responsible for getting a specific array of elements (let's call it 'obj') from this XML. The problem is that we process a lot of XMLs from different sources, therefore, each XML has its own structure and the obj element is located in different places.

The code is currently in the following structure:

function a(self, code, ...):
    xml_doc = ... # this is a dict from a xml document that can have one of many different structures
    obj = None  # Array of objects that I want to get from the XML. It might be processed later but is eventually returned.

    #Because the XML can have different structures, the object I want to get can be placed in different (well-known) places depending on the code value.

    if code is 'a':
       obj = xml_doc["key1"]["key2"]
    elif code is 'b':
       obj = xml_doc["key3"]
       ...
       # code that processes the obj object
       ...
    elif code is 'b':
       obj = xml_doc["key4"]["key5"]["key6"]
    ... # elif for different codes goes indefinitely   

    return obj

As you can see (or not - but believe me), it's not very friendly to add new entries to this function and add code to the cases that have to be processed. So I was looking for a way to do it using dictionaries to map the code to the correct XML structure. Something in the direction of the following example:

...
    xml_doc = ...

    # That would be extremely neat. 
    code_to_pattern = {
        'a': xml_doc["key1"]["key2"],
        'b': xml_doc["key3"],
        'c': xml_doc["key4"]["key5"]["key6"],
        ...
    }

    obj = code_to_pattern[code]
    obj = self.process_xml(code, obj) # It will process the array if it has to in another function.
    return obj

...

However, the above code doesn't work for obvious reasons. Each entry of the code_to_pattern dictionary is trying to access an element in xml_doc that might not exist, then an exception is raised. I thought in adding the entries as strings and then using the exec() function, so python only interpret the string in the right moment, however I'm not very found of the exec function and I am sure someone can come up with a better idea.

The conditional processing part of the XML is easy to do, however I can't think in a better way to have an easy method to add new entries to the system.

I'd be very pleased if someone can help me with some ideas.

EDIT1: Thank you for your replies, guys. Your both (@jarondl and @holdenweb) gave me workable and working ideas. For the right answer I am going to choose the one that required me the less change in the format I gave you even though I am going to solve it through xPath.

2 Answers2

1

You should first consider alternatives such as xpath to read the xml, depending on how you parsed it.

If you want to proceed with your dictionary, you can have non evaluated code with lambda - no need for exec:

code_to_pattern = {
    'a': lambda doc: doc["key1"]["key2"],
    'b': lambda doc: doc["key3"],
    'c': lambda doc: doc["key4"]["key5"]["key6"],
    ...
}
obj = code_to_pattern[code](xml_doc)
jarondl
  • 1,593
  • 4
  • 18
  • 27
0

Essentially you are looking for a data-driven solution. You have the essence of such a solution, but rather than mapping the codes to elements of the xml_doc table, it might be easier to map the codes to the required keys. In other words, look at doing:

xml_doc = ...

code_to_pattern = {
    'a': "key1", "key2",
    'b': "key3",
    'c': "key4", "key5", "key6",
    ...
}

The problem there is that you would then need to adapt to the variable number of keys that the different objects mapped to, so a simple

obj = code_to_pattern[code]

wouldn't cut it. Be aware, though, that dicts can take tuples as arguments, so it's possible (though you'd know better than I) that rather than using successive indices like xml_doc["key4"]["key5"]["key6"] you might be able to use tuple indices like xml_doc["key4", "key5", "key6"]. This may or may not help you with your problem.

Finally, you might find it helpful to learn about the collections.defaultdict object, since this automates the creation of new entries rather than forcing you to test for their presence and create them if absent. This could be helpful even in tuple keys won't cut it for you.

holdenweb
  • 33,305
  • 7
  • 57
  • 77