How to flatten list of dictionaries

Question

I am learning how to traverse lists and dictionaries in Python. I am a beginner.

process = [
                {
                'process1':
                            [
                                {"subprocess1":["subprocess1_1","subprocess1_2"]},
                                "subprocess2",
                                {"subprocess3":["subprocess3_1", "subprocess3_2"]},
                                "subprocess4",
                                {"subprocess5":[{"subprocess5_1":["subprocess5_1_1","subprocess5_1_2"]}]},
                            ],
                    },
                {
                'process2':
                            [
                                "subprocess2_1"
                            ]
                }
      ]

How do I flatten above list of dictionaries into the following?

process1 = [subprocess1, subprocess1_1, subprocess1_2, subprocess2, subprocess3, subprocess3_1, subprocess3_2, subprocess4, subprocess5, subprocess5_1, subprocess5_1_1, subprocess5_1_2]
process2 = [subprocess2_1]

Figure out first how to iterate over a simple flat list (item by item) and a simple flat dictionary (key-value pairs). Once you know that, a nested list of dicts with lists is essentially the same: loop over the outer list, then each item would be a dict, so loop over the keys and values of the dict, then each value of the dict would be a list, so on and on.. — Gino Mempin, Apr 04 '23 at 22:30

Driftr95 · Answer 1 · 2023-04-04T23:57:26.723

Nested for-loops or list comprehension are common ways to flatten objects of fixed depth; and recursion is generally useful for flattening objects of arbitrary depth, so you can

flatten by one more level with each recursive call, and
use isinstance to detect dictionaries before getting .values, and
use hasattr to check if an input has __iter__ (and is, therefore, iterable)

You can use the nested loop in a generator function

def flatten_obj(obj):
    if hasattr(obj, '__iter__') and not isinstance(obj, str): 
        for i in (obj.values() if isinstance(obj, dict) else obj):
            for v in flatten_obj(i): yield v
    else: yield obj

But if you want the function to return a list, list comprehension might be preferable to initiating an empty list and appending to it in a nested loop.

def get_flat_list(obj, listify_single=False):
    if isinstance(obj, str) or not hasattr(obj, '__iter__'): 
        return [obj] if listify_single else obj
    
    if isinstance(obj, dict): obj = obj.values()
    return [v for x in obj for v in get_flat_list(x,listify_single=True)]

Using

either {k: list(flatten_obj(v)) for i in process for k,v in i.items()}
or {k: get_flat_list(v) for i in process for k,v in i.items()}

should return

{
  'process1': ['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2'],
  'process2': ['subprocess2_1']
}

Ofc you can also define process1 and process2 as separate variables:

process1 = get_flat_list(process[0]['process1']) 
# list(flatten_obj(process[0]['process1']))

process2 = get_flat_list(process[1]['process2']) 
# list(flatten_obj(process[1]['process2']))

or

process1, process2, *_ = [list(flatten_obj(v)) for i in process for v in i.values()]
# process1, process2, *_ = [get_flat_list(v) for i in process for v in i.values()]

Love the fixed depth vs arbitrary depth point - thank you! – jtlz2 Aug 03 '23 at 18:24 — jtlz2, Aug 03 '23 at 18:24

score 0 · Accepted Answer · answered Apr 04 '23 at 22:32

In such cases, it's always useful to use a recursive function/generator:

def flatten(x):
    if isinstance(x, list):
        for item in x:
            yield from flatten(item)
    elif isinstance(x, dict):
        for k, v in x.items():
            yield k
            yield from flatten(v)
    else:
        yield x
        
out = {k: list(flatten(v)) for d in process for k,v in d.items()}

Output:

out['process1']
# ['subprocess1', 'subprocess1_1', 'subprocess1_2', 'subprocess2',
#  'subprocess3', 'subprocess3_1', 'subprocess3_2', 'subprocess4',
#  'subprocess5', 'subprocess5_1', 'subprocess5_1_1', 'subprocess5_1_2']

out['process2']
# ['subprocess2_1']

GSquirrel · Answer 3 · 2023-04-04T23:22:20.013

Here is another aproach which uses the flatten function which takes a nested list or dictionary as input and returns a flattened list of its elements. It uses recursion to handle nested lists and dictionaries of arbitrary depth.

To use the function with your process data structure, you can extract the nested lists from the dictionary values using indexing and pass them to the flatten function. Here's how you can do it:


# define the flatten function
def flatten(lst):
    # create an empty list to store the flattened list
    result = []
    # iterate through each element in the input list
    for item in lst:
        # if the element is a dictionary, recursively flatten its values
        if isinstance(item, dict):
            for val in item.values():
                result.extend(flatten(val))
        # if the element is a list, recursively flatten its elements
        elif isinstance(item, list):
            result.extend(flatten(item))
        # otherwise, append the element to the result list
        else:
            result.append(item)
    # return the flattened list
    return result

# extract the nested lists and flatten them
process1 = flatten(process[0]['process1'])
process2 = flatten(process[1]['process2'])

print(process1)
print(process2)

This will output:

['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2']
['subprocess2_1']

How to flatten list of dictionaries

3 Answers3