2

I am receiving data in batches from an API in JSON format. I wish to store only the values, in a list.

The raw data looks like this and will always look like this, i.e: all {...} will look like the first example:

data = content.get('data')
>>> [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {...}, {...}, ...]

The nested dictionary is making this harder; I need this unpacked as well.

Here is what I have, which works but it feels so bad:

unpacked = []
data = content.get('data')
for d in data:
    item = []
    for k, v in d.items():
        if k == 'b':
            for val in v.values():
                item.append(val)
        else:
            item.append(v)
    unpacked.append(item)

Output:

>>> [[1,2,3,4], [...], [...], ...]

How can I improve this?

turnip
  • 2,246
  • 5
  • 30
  • 58
  • Can you provide the desired output as well – Kaushik NP Oct 20 '17 at 08:48
  • Done. Let me know if it needs clarification. – turnip Oct 20 '17 at 08:50
  • If you are sure that this is the same pattern for all the elements in the list but with different keys, you can replace `k == 'b'` with `type(v) == dict` – anupsabraham Oct 20 '17 at 08:53
  • Your code does not work. For `data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]`, it gives `[[1, 'c', 'd', 4], [5, 6]]` – Kaushik NP Oct 20 '17 at 08:53
  • 2
    @anupsabraham Prefer `isinstance(v, dict)`. A simple recursive function with two branch base on `isinstance(v, dict)` extending with sublist or just appending current element should work. – Maxime Lorant Oct 20 '17 at 08:54
  • Can the dictionaries also contain lists _inside_ them? – cs95 Oct 20 '17 at 09:03
  • @cᴏʟᴅsᴘᴇᴇᴅ, no, the format I have supplied is the only possible data input. – turnip Oct 20 '17 at 09:07
  • @KaushikNP thanks, fixed it - was a typo – turnip Oct 20 '17 at 09:07
  • For `[{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]`, should the result be `[1, 2, 3, 4, 5, 6]` or `[[1, 2, 3, 4], [5, 6]]`? – Eric Duminil Oct 20 '17 at 09:07
  • This operation isn't safe, since Python (currently) doesn't guarantee the order of dict items (although in Python 3.6 insertion order is preserved, but that's currently an implementation order that shouldn't be relied on). So to do this safely, you need some way to ensure that the keys are always unpacked in the correct order. – PM 2Ring Oct 20 '17 at 09:15

5 Answers5

6

You could use a recursive function and some type tests:

data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]

def extract_nested_values(it):
    if isinstance(it, list):
        for sub_it in it:
            yield from extract_nested_values(sub_it)
    elif isinstance(it, dict):
        for value in it.values():
            yield from extract_nested_values(value)
    else:
        yield it

print(list(extract_nested_values(data)))
# [1, 2, 3, 4, 5, 6]

Note that it outputs a flat generator, not a list of lists.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
2

Assuming your dictionaries do not contain inner lists, you could define a simple routine to unpack a nested dictionary, and iterate through each item in data using a loop.

def unpack(data):
    for k, v in data.items():
        if isinstance(v, dict):
            yield from unpack(v)
        else:
            yield v

Note that this function is as simple as it is thanks to the magic of yield from. Now, let's call it with some data.

data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]  # Data "borrowed" from Kaushik NP
result = [list(unpack(x)) for x in data]

print(result)
[[2, 3, 1, 4], [5, 6]]

Note the lack of order in your result, because of the arbitrary order of dictionaries.

cs95
  • 379,657
  • 97
  • 704
  • 746
1

For completeness, based on the excellent answer of Eric Duminil, here is a function that returns the maximum depth of a nested dict or list:

def depth(it, count=0):
    """Depth of a nested dict.
    # Arguments
        it: a nested dict or list.
        count: a constant value used in internal calculations.
    # Returns
        Numeric value.
    """
    if isinstance(it, list):
        if any(isinstance(v, list) or isinstance(v, dict) for v in it):
            for v in it:
                if isinstance(v, list) or isinstance(v, dict):
                    return depth(v, count + 1)
        else:
            return count
    elif isinstance(it, dict):
        if any(isinstance(v, list) or isinstance(v, dict) for v in it.values()):
            for v in it.values():
                if isinstance(v, list) or isinstance(v, dict):
                    return depth(v, count + 1)
        else:
            return count
    else:
        return count

In the Python tradition, it is zero-based.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33
0

Doing recursively :

def traverse(d): 
    for key,val in d.items(): 
        if isinstance(val, dict): 
             traverse(val) 
        else: 
             l.append(val) 

out=[]
for d in data:
    l=[]
    traverse(d)
    out.append(l)

print(out)

#driver values :

IN : data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]
OUT : out = [[1, 2, 3, 4], [5, 6]]

EDIT : A better way to do this is using yield so as not to have to rely on global variables as in the first method.

def traverse(d): 
    for key,val in d.items(): 
        if isinstance(val, dict): 
             yield from traverse(val) 
        else: 
             yield val

out = [list(traverse(d)) for d in data]
Kaushik NP
  • 6,733
  • 9
  • 31
  • 60
0

Other answers (especially @COLDSPEED's) have already covered the situation, but here is a slightly different code based on the old adage it's better to ask forgiveness than permission , which I tend to prefer to type checking:

def unpack(data):
    try:
        for value in data.values():
            yield from unpack(value)
    except AttributeError:
        yield data


data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}]
unpacked = [list(unpack(item)) for item in data]
Guillaume
  • 5,497
  • 3
  • 24
  • 42