2

I have a python dict that looks like this

{'data': [{'data': [{'data': 'gen1', 'name': 'objectID'},
                   {'data': 'familyX', 'name': 'family'}],
          'name': 'An-instance-of-A'},
         {'data': [{'data': 'gen2', 'name': 'objectID'},
                   {'data': 'familyY', 'name': 'family'},
                   {'data': [{'data': [{'data': '21',
                                        'name': 'objectID'},
                                       {'data': 'name-for-21',
                                        'name': 'name'},
                                       {'data': 'no-name', 'name': None}],
                              'name': 'An-instance-of-X:'},
                             {'data': [{'data': '22',
                                        'name': 'objectID'}],
                              'name': 'An-instance-of-X:'}],
                    'name': 'List-of-2-X-elements:'}],
          'name': 'An-instance-of-A'}],
'name': 'main'}

The structure is repeating and its rule is like:

  • A dict contains 'name' and 'data'
  • 'data' can contain a list of dicts
  • If 'data' is not a list, it is a value I need.
  • 'name' is a just a name

The problem is that for each value, I need to know every info for each parent.

So at the end, I need to print a list with items that looks something like:

objectID=gen2 family=familyY An-instance-of-X_objectID=21 An-instance-of-X_name=name-for-21

Edit: This is only one of several lines I want as the output. I need one line like this for each item that doesn’t have a dict as 'data'.

So, for each data that is not a dict, traverse up, find info and print it..

I don't know every function in modules like itertools and collections. But is there something in there I can use? What is this called (when I am trying to do research on my own)?

I can find many "flatten dict" methods, but not like this, not when I have 'data', 'name' like this..

Stals
  • 1,543
  • 4
  • 27
  • 52
xeor
  • 5,301
  • 5
  • 36
  • 59
  • 1
    I think it's a crazy dict? why do you have such a dict, its not any way you can get a better one? – Netwave Dec 20 '12 at 10:42
  • This looks a lot like `json` data. Is that what it is? If so, you can use the json module directly – inspectorG4dget Dec 20 '12 at 10:42
  • Seems like a JSON to me. – Rohit Jain Dec 20 '12 at 10:43
  • 1
    please give a basic idea of how this should work. sample input, sample output. and what you have tried for a function. – Inbar Rose Dec 20 '12 at 10:47
  • The dict is a result from parsing a yaml like data structure (which is not yaml, but looks like it). – xeor Dec 20 '12 at 10:59
  • Added a little explanation for the output data. – xeor Dec 20 '12 at 11:01
  • 1
    Looking at this dict, the *very first thing* that I would do to it is convert each dictionary, at every level, from having a `'data'` key and a `'name'` key to having the `'name'` value be a key pointing to the `'data'` value. That will make it less painful to look at, think about, and work with. – Mark Amery Dec 20 '12 at 11:11
  • Also, thinking about it, what's ridiculous about this structure is that some of the lists (like the outermost list shown) are genuine lists of alike objects, whereas some of the lists (like the inner lists in this sample) actually represent single objects, but are just lists of the key/value pairs that make up the object's attributes. That's going to make parsing this more irritating than it needs to be, because a list can essentially have one of two very different meanings and you need to look at the contents to know which is being used. – Mark Amery Dec 20 '12 at 11:15
  • It may not be relevant to answering your question, but is it okay if I ask *why on earth* you have to work with this unholy creature (i.e. its source), and what the ultimate end goal is (what you're doing with this data)? – Mark Amery Dec 20 '12 at 11:16
  • 1
    The data I am trying to parse is organized in a yaml like syntax (but not yaml at all). I am making this dicts in a list kind of monster because some of the names in the list repeat themself. So instead of a bunch of keys colliding, this is the way I haveto store it.. The source I am getting this from is a 3rd party information script that is gives me an overview of some data structures.. – xeor Dec 20 '12 at 11:33
  • @xeor Ah, repeating keys. That partially explains this horrible format. :) – Mark Amery Dec 20 '12 at 11:36

1 Answers1

3

This is a wonderful example what recursion is good for:

input_ = {'data': [{'data': [{'data': 'gen1', 'name': 'objectID'},
                   {'data': 'familyX', 'name': 'family'}],
          'name': 'An-instance-of-A'},
         {'data': [{'data': 'gen2', 'name': 'objectID'},
                   {'data': 'familyY', 'name': 'family'},
                   {'data': [{'data': [{'data': '21',
                                        'name': 'objectID'},
                                       {'data': 'name-for-21',
                                        'name': 'name'},
                                       {'data': 'no-name', 'name': None}],
                              'name': 'An-instance-of-X:'},
                             {'data': [{'data': '22',
                                        'name': 'objectID'}],
                              'name': 'An-instance-of-X:'}],
                    'name': 'List-of-2-X-elements:'}],
          'name': 'An-instance-of-A'}],
'name': 'main'}

def parse_dict(d, predecessors, output):
    """Recurse into dict and fill list of path-value-pairs"""
    data = d["data"]
    name = d["name"]
    name = name.strip(":") if type(name) is str else name
    if type(data) is list:
        for d_ in data:
            parse_dict(d_, predecessors + [name], output)
    else:
        output.append(("_".join(map(str,predecessors+[name])), data))

result = []

parse_dict(input_, [], result)

print "\n".join(map(lambda x: "%s=%s"%(x[0],x[1]),result))

Output:

main_An-instance-of-A_objectID=gen1
main_An-instance-of-A_family=familyX
main_An-instance-of-A_objectID=gen2
main_An-instance-of-A_family=familyY
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_objectID=21
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_name=name-for-21
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_None=no-name
main_An-instance-of-A_List-of-2-X-elements_An-instance-of-X_objectID=22

I hope I understood your requirements correctly. If you don't want to join the paths into strings, you can keep the list of predecessors instead.

Greetings,

Thorsten

Thorsten Kranz
  • 12,492
  • 2
  • 39
  • 56
  • The OP's example strips out trailing colons in names - you might want to add that behavior. Other than that, I can't find anything wrong here, although I still haven't the slightest idea why the OP wants to print the data in this way, especially given that it makes it hard to tell the different instances of X in the same list apart. – Mark Amery Dec 20 '12 at 11:33
  • I'm not going to read this my self.. My log analyzer will do that :) So this is the perfect format for it.. – xeor Dec 20 '12 at 11:46
  • This solution is almost what I was looking for, thanks! I will polish it a little to fit the exact needs. – xeor Dec 20 '12 at 11:52
  • Glad this helps. Mark, I included your suggestion, thanks for that. – Thorsten Kranz Dec 20 '12 at 11:54
  • `if type(data) is list:`? I think the preferred form is now `if isinstance(data, list):`. See http://stackoverflow.com/questions/1549801/differences-between-isinstance-and-type-in-python – PaulMcG Dec 20 '12 at 14:16
  • Sure, Paul, you're right, but for simple cases where no derived types of list are expected, I really like the elegance of Python syntax - almost like natural language. – Thorsten Kranz Dec 20 '12 at 23:01