1

How can I split a dictionary of lists into chunks of a given size in Python. I can chunk a list. I can chunk a dictionary. But I can't quite work out how to chunk a dictionary of lists (efficiently).

my_dict = {
  "key1": ["a","b","c","d"],
  "key2": ["e","f","g","h"],
}

How can I chunk it so that each chunk has no more than 3 values:

{
  "key1": ["a","b","c"]
}
{
  "key1": ["d"],
  "key2": ["e","f"],
}
{
  "key2": ["g","h"],
}

Notice how the 2nd dictionary spans 2 keys.

trubliphone
  • 4,132
  • 3
  • 42
  • 66
  • It shouldn't be that hard to do something naive, but possibly slow. Do that, then worry about making it faster or "neater". – chepner Mar 01 '22 at 17:07

4 Answers4

2

Quite a naive generator to do that would be:

  1. Initialize a dict and a counter.
  2. Iterate over all values (that is, all values inside the lists which are the dict's values).
  3. Add them to the new dict with the matching key, and increment the counter.
  4. Once the required size is reached, yield/return the chunk and re-initialize the dict and counter.
  5. back to 2.

That would be:

from collections import defaultdict

def chunker(d, chunk_size):
    chunk = defaultdict(list)
    size = 0
    for key, list_value in d.items():
        for value in list_value:
            chunk[key].append(value)
            size += 1
            if size == chunk_size:
                yield dict(chunk)
                chunk = defaultdict(list)
                size = 0
    if size:
        yield dict(chunk)

And running:

my_dict = {
  "key1": ["a","b","c","d"],
  "key2": ["e","f","g","h"],
}

for chunk in chunker(my_dict, 3):
    print(chunk)

Will give:

{'key1': ['a', 'b', 'c']}
{'key1': ['d'], 'key2': ['e', 'f']}
{'key2': ['g', 'h']}

Instead of a simple dict, this uses a defaultdict to simplify the code.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
1

This requires a single pass, but isn't the most elegant solution by any means:

def chunk_dict(in_dict, chunk_size):
    chunked = [{}]
    items_left = chunk_size
    for key in in_dict:
        for el in in_dict[key]:
            if items_left == 0:
                chunked.append({})
                items_left = chunk_size
            target_dict = chunked[-1]
            if key not in target_dict:
                target_dict[key] = []
            target_dict[key].append(el)
            items_left -= 1
    return chunked
Jeremy
  • 661
  • 7
  • 19
1

I see that there are answers and they're a great (and I'm slow). Though, I do it a bit differently - I do not add items 1 by 1 to the list, but rather do this in batch. You can throw yield there to make generator if you need to.

def split(_dict, limit=3):
    result = []
    room = 0
    for key, val in _dict.items():
        cur_val = val
        while cur_val:
            if room < 1:
                result.append({})
                room = limit
            cut = cur_val[:room]
            room -= len(cut)
            result[-1][key] = cut
            cur_val = cur_val[len(cut):]
    return result

split(my_dict, 3)
> [{'key1': ['a', 'b', 'c']}, {'key1': ['d'], 'key2': ['e', 'f']}, {'key2': ['g', 'h']}]
Alexander B.
  • 626
  • 6
  • 21
1

You can first "flatten" the dict of lists into pairs of (key, value_from_list). Then you can simply iterate a list in chunks. The tricky part is just making the chunks back into a dict of lists (turn [("key1", "a"), ("key1", "b")] into {"key1": ["a", "b"]}). For that we will use a defaultdict and iterate over the chunks:

from collections import defaultdict

def chunker(d, chunk_size):
    flat = [(key, value) for key, l in d.items() for value in l]
    for pos in range(0, len(flat), chunk_size):
        chunk = defaultdict(list)
        for key, value in flat[pos:pos + chunk_size]:
            chunk[key].append(value)
        yield dict(chunk)

And running it as:

my_dict = {
  "key1": ["a","b","c","d"],
  "key2": ["e","f","g","h"],
}

for chunk in chunker(my_dict, 3):
    print(chunk)

Will give:

{'key1': ['a', 'b', 'c']}
{'key1': ['d'], 'key2': ['e', 'f']}
{'key2': ['g', 'h']}

If you want to go the extra mile of saving the creation of the flat list, you can make it a generator instead (flat = ((key, value) for key, l in d.items() for value in l)) and then follow how to Iterate an iterator by chunks (of n) in Python?.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61