0

With the help of this answer, I'm trying to come up with a function that searches after a key in a nested Python dict and also records the "path" of each match. My function (see below) seems to work, however it is not possible to save the result in a list (see code output). I'm pretty certain that the difficulty lies in the yield command, but I have not been able to figure it out yet.

o={
  'dict1': {
    'dict11': {
      'entry11_1':1,
      'entry11_2':2,
    },
    'dict12': {
      'entry12_1':12,
      'entry12_2':22,
    },
  },
  'dict2': {
    'dict21': {
      'entry21_1':21,
    }
  },
}


curr_pos=[]
def gen_dict_extract(key, var):
  global curr_pos
  if hasattr(var,'iteritems'):
    for k, v in var.iteritems():
      #print curr_pos
      if k == key:
        yield v,curr_pos
      if isinstance(v, dict):
        curr_pos.append(k)
        for result in gen_dict_extract(key, v):
          yield result
      elif isinstance(v, list):
        for d in v:
          for result in gen_dict_extract(key, d):
            yield result
    if len(curr_pos)>0:
      curr_pos.pop()


result_list=[]
for ind,i in enumerate(gen_dict_extract('entry12_1',o)):
  result_list.append(i)
  print result_list[-1]
print result_list[-1]

Output:

(12, ['dict1', 'dict12'])
(12, [])
Community
  • 1
  • 1
cass
  • 309
  • 1
  • 5
  • 14

3 Answers3

2

For the sake of completeness, here's a version with Serge's suggestions. Also I made some additional changes so the function is able to cope with any nested list and dict combination.

def gen_dict_extract(key, var,curr_pos=None):
  """
  key: key to search for
  var: nested dict to search in 
  """
  #print curr_pos
  if curr_pos is None:
    curr_pos=[]
  if hasattr(var,'iteritems'):
    for k, v in var.iteritems():
      curr_pos.append(k)
      if k == key:
        yield v,curr_pos[:]
      if isinstance(v, dict):
        for result in gen_dict_extract(key, v,curr_pos):
          yield result
      elif isinstance(v, list):
        curr_pos.append(0)
        for ind,d in enumerate(v):
          curr_pos.pop()
          curr_pos.append(ind)
          for result in gen_dict_extract(key, d,curr_pos):
            yield result
        curr_pos.pop()
      curr_pos.pop()
  elif isinstance(var, list):
    curr_pos.append(0)
    for ind,d in enumerate(var):
      curr_pos.pop()
      curr_pos.append(ind)
      for result in gen_dict_extract(key, d,curr_pos):
        yield result
    curr_pos.pop()
cass
  • 309
  • 1
  • 5
  • 14
1

The problem is that i is a tupple object. You need to copy i for avoid overwrite.

import copy
result_list = []
for in ind in enumerate(gen_dict_extract('entry12_1',o)):
    result_list.append(copy.deepcopy(i))
print result_list
djangoliv
  • 1,698
  • 14
  • 26
1

In gen_dict_extract you use a global list curr_pos and directly yield it when you have found the key (yield v,curr_pos). But a list is a mutable type, and you later modify it (curr_pos.pop())

What you have stored in result_list is just a reference to the global object, so it contains the expected value inside the loop, but is emptied at the end of the loop. You should just return a shallow copy at yield time: yield v,curr_pos[:]

You will then get as expected:

(12, ['dict1', 'dict12'])
(12, ['dict1', 'dict12'])

BTW, it you want to avoid a global list, you could pass the list as an optional parameter:

def gen_dict_extract(key, var, curr_pos = None):
    if curr_pos is None:
        curr_pos = []
    ...
        for result in gen_dict_extract(key, v, curr_pos):
    ...
          for result in gen_dict_extract(key, d, curr_pos):
    ...

That would ensure that you use a new list on each fresh invocation, while correctly passing it when recursing

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252