0

I'm trying to figure out how to map a recursive structure containing both dictionaries and lists, so far I've got this:

import collections


def rec_walk(l):
    for v in l:
        if isinstance(v, list):
            yield from rec_walk(v)
        else:
            yield v


def rec_map(l, f):
    for v in l:
        if isinstance(v, collections.Iterable):
            if isinstance(v, list):
                yield list(rec_map(v, f))
            elif isinstance(v, dict):
                yield dict(rec_map(v, f))
        else:
            yield f(v)


a = ["0", ["1", "2", ["3", "4"]], [[[[["5"]]]]]]
print(list(rec_map(a, lambda x: x + "_tweaked")))
b = {
    'a': ["0", "1"],
    'b': [[[[[["2"]]]]]],
    'c': {
        'd': [{
            'e': [[[[[[["3"]]]]]]]
        }]
    }
}
print(dict(rec_map(b, lambda x: x + "_tweaked")))

Output:

[[[]], [[[[[]]]]]]
{}

As you can see, problem with the above example is that rec_map is not returning a properly mapped structure, what I'm trying to get is either the same structure mapped properly or a new cloned mapped one, for example, something like this:

a = ["0", ["1", "2", ["3", "4"]], [[[[["5"]]]]]]
rec_map(a, lambda x: x + "_tweaked")

should transform a into:

["0_tweaked", ["1_tweaked", "2_tweaked", ["3_tweaked", "4_tweaked"]], [[[[["5_tweaked"]]]]]]

and:

b = {
    'a': ["0", "1"],
    'b': [[[[[["2"]]]]]],
    'c': {
        'd': [{
            'e': [[[[[[["3"]]]]]]]
        }]
    }
}
print(dict(rec_map(b, lambda x: x + "_tweaked")))

into:

b = {
    'a': ["0_tweaked", "1_tweaked"],
    'b': [[[[[["2_tweaked"]]]]]],
    'c': {
        'd': [{
            'e': [[[[[[["3_tweaked"]]]]]]]
        }]
    }
}
BPL
  • 9,632
  • 9
  • 59
  • 117

2 Answers2

1

This is due to yield from. You should use yield list() instead.

yield from yield each element from the generator one at a time, but what you want here is to yield the whole list instead of each element of it.

what's the difference between yield from and yield in python 3.3.2+ This question explains the difference.

The following modified version of code generates the behavior you wanted:

def rec_walk(l):
    for v in l:
        if isinstance(v, list):
            yield list(rec_walk(v))
        else:
            yield v


def rec_map(l, f):
    for v in l:
        if isinstance(v, list):
            yield list(rec_map(v, f))
        else:
            yield f(v)


a = ["0", ["1", "2", ["3", "4"]], [[[[["5"]]]]]]
print('-' * 80)
print(list(rec_walk(a)))
print('-' * 80)
print(list(rec_map(a, lambda x: x + "_tweaked")))
Haochen Wu
  • 1,753
  • 1
  • 17
  • 24
  • What problem do you have there? Maybe check if an object is iterable instead of if it's a list? – Haochen Wu Oct 19 '17 at 18:37
  • for v in l only iterate over keys when l is a dict. That is the problem here, but I don't have a very clean solution to it at hand now. I'll think about it. – Haochen Wu Oct 19 '17 at 18:59
  • https://stackoverflow.com/questions/10756427/loop-through-all-nested-dictionary-values This one is relevant but may not solve all your problems. – Haochen Wu Oct 19 '17 at 19:01
  • https://stackoverflow.com/questions/11501090/iterate-over-nested-lists-and-dictionaries It seems this is what you want, but it's still kinda hacky. – Haochen Wu Oct 19 '17 at 19:03
1

You are creating a generator, then using yield from, which essentially flattens. Instead, you'll want to materialize the generator instead of yielding from it:

In [1]: def rec_map(l, f):
   ...:     for v in l:
   ...:         if isinstance(v, list):
   ...:             yield list(rec_map(v, f))
   ...:         else:
   ...:             yield f(v)
   ...:

In [2]: a = ["0", ["1", "2", ["3", "4"]], [[[[["5"]]]]]]
   ...:

In [3]: list(rec_map(a, lambda x: x + "_tweaked"))
Out[3]:
['0_tweaked',
 ['1_tweaked', '2_tweaked', ['3_tweaked', '4_tweaked']],
 [[[[['5_tweaked']]]]]]

The problem you are encountering is that it is much more difficult to do this with a generator, because you have to carefully curate what is returned. Honestly, it doesn't seem like you even need a generator, just use:

In [16]: def rec_map(l, f):
    ...:     if isinstance(l, list):
    ...:         return [rec_map(v, f) for v in l]
    ...:     elif isinstance(l, dict):
    ...:         return {k:rec_map(v, f) for k,v in l.items()}
    ...:     else:
    ...:         return f(l)
    ...:

In [17]: rec_map(b, lambda x: x + '_tweaked')
Out[17]:
{'a': ['0_tweaked', '1_tweaked'],
 'b': [[[[[['2_tweaked']]]]]],
 'c': {'d': [{'e': [[[[[[['3_tweaked']]]]]]]}]}}

Also, don't use collections.Iterable, check explicitely for thet ypes you are handling. Note:

In [18]: isinstance('I am a string but I am iterable!', collections.Iterable)
Out[18]: True
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • @BPL it's the same principle, you need to materialize it into a dictionary or list depending on the container you are iterating into – juanpa.arrivillaga Oct 19 '17 at 18:36
  • Thanks a bunch, your solution is a really clean one without using generators, I'd give you another like... but you know, i've already given you 1 ;) – BPL Oct 19 '17 at 19:18