0

I am trying to write a function which searches for a key in an arbitrarily deep nested dict and returns its value(s) together with the path(s) of ancestor keys -

# find_key.py
def find_key(key, targ, path=[]):
    """ Search an aribitarily deep nested dict `targ` for key `key`.
        Return its value(s) together with the path(s) of ancestor keys.
    """
    if key == targ:
        yield v, path
    if isinstance(targ, dict):
        for k, v in targ.items():
            if k == key:
                yield v, path
            else:
                find_key(key, v, path.append(k))
                path.pop()

# test code
targ_1 = {'a': 1, 'b': {'b': 2, 'c': 3}, 'd': {'e': {'f': 4}}}
tests = {'test_a': {'key' : 'a', 'targ': targ_1, 'expect': [(1, [])]},
         'test_b': {'key' : 'b', 'targ': targ_1, 'expect': [({'b': 2, 'c': 3}, []), (2, ['b'])]},
         'test_c': {'key' : 'c', 'targ': targ_1, 'expect': [(3, ['b'])]},
         'test_d': {'key' : 'd', 'targ': targ_1, 'expect': [({'e': {'f': 4}}, [])]},
         'test_e': {'key' : 'e', 'targ': targ_1, 'expect': [({'f': 4}, ['d'])]},
         'test_f': {'key' : 'f', 'targ': targ_1, 'expect': [(4, ['d', 'e'])]}}
for k, v in tests.items():
    if list(find_key(v['key'], v['targ'])) == v['expect']:
        print(k, 'OK')
    else:
        print(k, 'actual:', list(find_key(v['key'], v['targ'])))
        print(k, 'expected:', v['expect'])

Executing the code shows that many test cases failed -

(3.8) $ python find_key.py 
test_a OK
test_b actual: [({'b': 2, 'c': 3}, [])]
test_b expected: [({'b': 2, 'c': 3}, []), (2, ['b'])]
test_c actual: []
test_c expected: [(3, ['b'])]
test_d OK
test_e actual: []
test_e expected: [({'f': 4}, ['d'])]
test_f actual: []
test_f expected: [(4, ['d', 'e'])]

I suspect the problem lies in the recursive call find_key so I inserted a breakpoint() above the call and re-executed the file -

(3.8) $ python find_key.py 
> find_key.py(15)find_key()
-> find_key(key, v, path.append(k))
(Pdb) s
--Call--
> find_key.py(1)find_key()
-> def find_key(key, targ, path=None):
(Pdb) s
GeneratorExit
> find_key.py(1)find_key()
-> def find_key(key, targ, path=None):
(Pdb) s
--Return--
> find_key.py(1)find_key()->None
-> def find_key(key, targ, path=None):
(Pdb) s
> find_key.py(16)find_key()
-> path.pop()
(Pdb) 

As you can see, Pdb does not step into the recursively called find_key but instead issues messages GeneratorExit and --Return--. How can I debug this issue?

user2309803
  • 541
  • 5
  • 15
  • Have you considered stepping through line by line using an IDE such as VSCode, PyCharm, etc? – Cory Kramer Mar 18 '21 at 14:33
  • @CoryKramer No, I prefer to learn the basic tools first. Are you suggesting it is not possible with Pdb? – user2309803 Mar 18 '21 at 14:38
  • Debugging likely won't bring you to a satisfactory answer here. The problem is that you create a generator object when calling `find_key(key, v, path.append(k))`, but then you don't do anything with it (e.g. `yield from`). Failing to use a return value isn't something that a debugger is likely to point out. – Brian61354270 Mar 18 '21 at 14:43
  • The `append` method does not return anything. So your recursive call passes `None` to `find_key`. – qouify Mar 18 '21 at 14:53
  • @Brian I don't have much experience with `yield`, my code was inspired by the top answer to [this question](https://stackoverflow.com/questions/9807634/find-all-occurrences-of-a-key-in-nested-dictionaries-and-lists). Do you recommend I abandon this style for my particular problem? – user2309803 Mar 18 '21 at 14:56

1 Answers1

1

human brain debugging

# find_key.py
def find_key(key, targ, path=[]):
    """ Search an aribitarily deep nested dict `targ` for key `key`.
        Return its value(s) together with the path(s) of ancestor keys.
    """
    if key == targ:
        yield v, path
    if isinstance(targ, dict):
        for k, v in targ.items():
            if k == key:
                yield v, path
            else:
                find_key(key, v, path.append(k))
                path.pop()
targ_1 = {'a': 1, 'b': {'b': 2, 'c': 3}, 'd': {'e': {'f': 4}}}
list(find_key("b", targ_1))

Let's simply evaluate the program using our human brain evaluator -

key = "b"
targ = {'a': 1, 'b': {'b': 2, 'c': 3}, 'd': {'e': {'f': 4}}}
path = []
# if key == targ:
#    yield v, path
if isinstance(targ, dict):
    for k, v in targ.items():
        # ("a", 1)
        # ("b", {'b': 2, 'c': 3})
        # ("d", {'e': {'f': 4}})
        k = "a"
        v = 1
        # if k == key:
            # yield v, path
        else:
            find_key(key, v, path.append(k)) # path = ["a"]
            path.pop()                       # path = []

In the first iteration of the for loop, k is not equal to key so we run the else branch of the program. find_key returns a generator, but we don't do anything with it. Ie, no return or yield from is used. So whatever it does, we can just ignore it entirely as it won't show up in the output of our function. To understand what I mean by that, consider the following example -

def foo():
  1+10      # missing yield or return

foo()
None

In the program above, foo will evaluate 1+10 but nothing happens with it, so it is ignored. This is the same as calling a function like find_key without using its return value - python will evaluate it but the result is immediate discarded. Let's move onto the second iteration now -

key = "b"
targ = {'a': 1, 'b': {'b': 2, 'c': 3}, 'd': {'e': {'f': 4}}}
path = []
# if key == targ:
#    yield v, path
if isinstance(targ, dict):
    for k, v in targ.items():
        # ("a", 1)
        # ("b", {'b': 2, 'c': 3})
        # ("d", {'e': {'f': 4}})
        k = "b"
        v = {'b': 2, 'c': 3}
        if k == key:
            yield v, path            # ({'b': 2, 'c': 3}, [])
        # else:
            # find_key(key, v, path.append(k)) # path = ["a"]
            # path.pop()                       # path = []

Above, we see the second iteration of the for loop. Here k == key, so now we yield. Note because you use the else keyword, we won't recur find_key. That means there is no attempt to find additional key = "b" values in v.

key = "b"
targ = {'a': 1, 'b': {'b': 2, 'c': 3}, 'd': {'e': {'f': 4}}}
path = []
# if key == targ:
#    yield v, path
if isinstance(targ, dict):
    for k, v in targ.items():
        # ("a", 1)
        # ("b", {'b': 2, 'c': 3})
        # ("d", {'e': {'f': 4}})
        k = "d"
        v = {'e': {'f': 4}}
        # if k == key:
            # yield v, path
        else:
            find_key(key, v, path.append(k)) # path = ["d"]
            path.pop()                       # path = []

In the third iteration of the for loop, it is just the same as the first. k is not equal to key and so find_key is called and the result is ignored.

The output is simple to determine. The second iteration where k == "b" is the only output from our function. Since you wrap the find_key call in list(...) this is the only result we will see in the list -

[({'b': 2, 'c': 3}, [])]

fixing the problem

Here are the things I noticed -

  1. You are missing yield from before the recursive call to find_key.
  2. The conditional logic can be simplified
  3. Using else prevents recursion on values, v, where k == key
  4. Avoiding mutation .append and .pop makes reasoning about the algorithm easier
def find_key(key, targ, path = []):
  """ Search an aribitarily deep nested dict `targ` for key `key`.
      Return its value(s) together with the path(s) of ancestor keys.
  """
  if isinstance(targ, dict):
    for (k,v) in targ.items():
      if k == key:
        yield (v, path)
      yield from find_key(key, v, [*path, k])  # <- no else, yield from
test_a OK
test_b OK
test_c OK
test_d OK
test_e OK
test_f OK

path param

You have the option to remove the path parameter from find_key signature as well -

# find_key.py
def find_key(key, targ):                  # <- no path arg
  """ Search an aribitarily deep nested dict `targ` for key `key`.
      Return its value(s) together with the path(s) of ancestor keys.
  """
  if isinstance(targ, dict):
    for (k,v) in targ.items():
      if k == key:
        yield (v, [])                     # <- empty path
      for (v, path) in find_key(key, v):  # <- get path from recursive result
        yield (v, [k, *path])             # <- prepend path

direct ancestor

Finally I think it's odd that the direct ancestor doesn't appear in the result. Ie,

list(find_key("a", {"a":1}))
[(1, [])]

According to the result above, the path to 1 is empty, []. I would expect the result to be [(1, ["a"])]. Ie, the path to 1 is targ["a"]. This is an easy change -

# find_key.py
def find_key(key, targ):
  """ Search an aribitarily deep nested dict `targ` for key `key`.
      Return its value(s) together with the path(s) of ancestor keys.
  """
  if isinstance(targ, dict):
    for (k,v) in targ.items():
      if k == key:
        yield (v, [k])                    # <- base path
      for (v, path) in find_key(key, v):
        yield (v, [k, *path])
list(find_key("b", targ_1))

The path to {'b': 2, 'c': 3} is targ["b"]
The path to 2 is targ["b"]["b"]

[({'b': 2, 'c': 3}, ['b']), (2, ['b', 'b'])]
Mulan
  • 129,518
  • 31
  • 228
  • 259
  • 1
    Apparently the answer is 'no' (with a software debugger) but your response was extremely informative - thanks! Since Python represents the widely used JSON data format in similar nested compound structures via [json.load(s)](https://docs.python.org/3/library/json.html#json.load) functions, I was surprised that I could not find a widely used / debugged module which provides out of the box functionality for these kind of requirements. Is it normal for Python developers to hand code such solutions? – user2309803 Mar 19 '21 at 07:49
  • @user2309803 ha I didn't mean to imply the answer is _no_. But maybe I meant to imply that homing in on your precise issue with a debugger is sometimes more trouble than it's worth. The issue here is nothing happens with the recursive call to `find_key` but it gets run anyway and the debugger will still hit your breakpoints so it _looks_ like things are working properly. A careful read of the program with our human interpreter can detect certain issues that are hard to detect with a debugger. – Mulan Mar 19 '21 at 15:31
  • The issue is compounded by the fact that python is a multi-paradigm language. It's perfectly valid to call functions and ignore the return values. You might be able to catch it with a linter, but idk I don't use linters. Other languages might issue a warning for such a thing or use implicit returns. I've always wanted to write a stack visualizer for python programs, and I think it's possible, but it would involve writing an interpreter that is capable of understanding all valid python programs - not a small task. – Mulan Mar 19 '21 at 15:42