Handle self-references when flattening dictionary

Question

Given some arbitrary dictionary

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}

I've written a small routine to flatten it in the process of writing an answer to another question.

def recursive_flatten(mydict):
    d = {}
    for k, v in mydict.items():
        if isinstance(v, dict):
            for k2, v2 in recursive_flatten(v).items():
                d[k + '.' + k2] = v2 
        else:
            d[k] = v
    return d

It works, giving me what I want:

new_dict = recursive_flatten(mydict)

print(new_dict)
{'first.second.third.fourth': 'the end'}

And should work for just about any arbitrarily structured dictionary. Unfortunately, it does not:

mydict['new_key'] = mydict

Now recursive_flatten(mydict) will run until I run out of stack space. I'm trying to figure out how to gracefully handle self-references (basically, ignore or remove them). To complicate matters, self-references may occur for any sub-dictionary... not just the top level. How would I handle self-references elegantly? I can think of a mutable default argument, but there should be a better way... right?

Pointers appreciated, thanks for reading. I welcome any other suggestions/improvements to recursive_flatten if you have them.

One way is to put everything you encounter in a dictionary and check as you go that you haven't seen the thing before. Another way is to keep two pointers and let one of them go two steps as the other goes one. If they ever coincide, you have a loop. The question is also what you do then - abort or return something well-defined. — Arndt Jonasson, Apr 20 '18 at 17:08

Mulan · Accepted Answer · 2018-04-20T17:53:36.353

One way you can do it using set and id. Note this solution also uses generators which means we can start using our flattened dict before the entire result is computed

def recursive_flatten (mydict):
  def loop (seen, path, value):

    # if we've seen this value, skip it
    if id(value) in seen:
      return

    # if we haven't seen this value, now we have
    else:
      seen.add(id(value))

    # if this value is a dict...
    if isinstance (value, dict):
      for (k, v) in value.items ():
        yield from loop(seen, path + [k], v)

    # base case
    else:
      yield (".".join(path), value)

  # init the loop    
  yield from loop (set(), [], mydict)

Program demo

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}

for (k,v) in recursive_flatten (mydict):
  print (k, v)

# first.second.third.fourth the end

mydict['new_key'] = mydict

for (k,v) in recursive_flatten (mydict):
  print (k, v)

# first.second.third.fourth the end

We can make a slight modification if you would like to see output for self-referential values

# if we've seen this value, skip it
if (id(value) in seen):
  # this is the new line
  yield (".".join(path), "*self-reference* %d" % id(value))
  return

Now the output of the program will be

first.second.third.fourth the end
first.second.third.fourth the end
new_key *self-reference* 139700111853032

score 1 · Answer 2 · answered Apr 20 '18 at 17:16

I'm not sure what your definition of "graceful" is, but this can be done with some bookkeeping of what has been seen before in a set of object ids:

class RecursiveFlatten:
    def __init__(self):
        self.seen = set()

    def __call__(self, mydict):
        self.seen.add(id(mydict))
        d = {}
        for k, v in mydict.items():
            if isinstance(v, dict):
                if id(v) not in self.seen:
                    self.seen.add(id(v))
                    for k2, v2 in self(v).items():
                        d[k + '.' + k2] = v2
            else:
                d[k] = v
        return d

def recursive_flatten(mydict):
    return RecursiveFlatten()(mydict)

Testing it out gives me what I expect

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         },
        'second2': {
            'third2': 'the end2'
        }
     }
}

mydict['first']['second']['new_key'] = mydict
mydict['new_key'] = mydict
print(recursive_flatten(mydict))

Out:

{'first.second2.third2': 'the end2', 'first.second.third.fourth': 'the end'}

Is it common to use classes like this in Python? Overkill to localize one private binding, isn't it? — Mulan, Apr 20 '18 at 17:32
I've done it before, idk if I'd say it was overkill but thinking about it more, the same effect could likely be achieved from a nested function. That's what I get for living in C++ for 3 years. In any case your approach is certainly better — Ryan Haining, Apr 20 '18 at 17:33
No worries. I'm new to Python myself so I don't know what the community considers idiomatic. It's neat to see a solution that works with minimal modification to the OP's original code! — Mulan, Apr 20 '18 at 17:35

Handle self-references when flattening dictionary

2 Answers2

Linked