How to split dictionary into multiple dictionaries fast

Question

I have found a solution but it is really slow:

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:i+SIZE])

Do you have any ideas without using external modules (numpy and etc.)

Don't keep calling `items`. You're making a new list of all the items every time you just want a slice. — user2357112, Apr 05 '14 at 09:00
yeah i know that, but the problem is that i can't find a different method to split my dictionary into equal sized chunks. — badc0re, Apr 05 '14 at 09:02
Try the [`grouper` recipe from `itertools`](https://docs.python.org/2.7/library/itertools.html#recipes). — jonrsharpe, Apr 05 '14 at 09:03
note: I don't see how splitting a dictionary can be useful... what the heck are you doing? — Karoly Horvath, Apr 05 '14 at 09:17

score 91 · Accepted Answer · edited Oct 19 '21 at 14:20

91

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

from itertools import islice

def chunks(data, SIZE=10000):
    it = iter(data)
    for i in range(0, len(data), SIZE):
        yield {k:data[k] for k in islice(it, SIZE)}

Sample run:

for item in chunks({i:i for i in xrange(10)}, 3):
    print(item)

Output

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}

edited Oct 19 '21 at 14:20

Ahmed

2,825
1
25
39

answered Apr 05 '14 at 09:07

thefourtheye

233,700
52
457
497

24

Great answer. Use `range()` instead of `xrange()` in Python3 – Bede Constantinides Jun 21 '18 at 12:40
Could you elaborate on how/why iterators/generators more preferable - is it for memory efficiency? – Addison Klinke Nov 17 '20 at 16:52

score 6 · Answer 2 · answered Mar 09 '21 at 22:31

For Python 3+.

xrange() was renamed to range() in Python 3+.

You can use;

from itertools import islice

def chunks(data, SIZE=10000):
   it = iter(data)
   for i in range(0, len(data), SIZE):
      yield {k:data[k] for k in islice(it, SIZE)}

Sample:

for item in chunks({i:i for i in range(10)}, 3):
   print(item)

With following output.

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{6: 6, 7: 7, 8: 8}
{9: 9}

ndpu · Answer 3 · 2014-04-05T12:16:39.613

5

Another method is iterators zipping:

>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks list to izip_longest you will get needed number of elements from source dict (ifilter used to remove None from zip results). With generator expression you can lower memory usage:

>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
 {'e': 5, 'd': 4, 'g': 7},
 {'h': 8, 'f': 6}]

edited Apr 05 '14 at 12:16

answered Apr 05 '14 at 09:48

ndpu

22,225
6
54
69

1

If taking this approach in Python 3, it's important to replace `d.iteritems()` with `iter(d.items())`, and not just `d.items()`. This is because @npdu's approach relies on the fact that you're exhausting the same iterator (so the view object returned by `d.items()` in Python 3 doesn't fulfill the same role). Other changes that you would make are replacing `izip_longest` with `zip_longest` and `ifilter` with the built-in `filter`. – deepyaman Jul 11 '22 at 10:53

Pratibha Gupta · Answer 4 · 2019-09-26T04:06:17.850

This code takes a large dictionary and splits it into a list of small dictionaries. max_limit variable is to tell maximum number of key-value pairs allowed in a sub-dictionary. This code doesn't take much effort to break the dictionary, just one complete parsing over the dictionary object.

import copy
def split_dict_to_multiple(input_dict, max_limit=200):
"""Splits dict into multiple dicts with given maximum size. 
Returns a list of dictionaries."""
chunks = []
curr_dict ={}
for k, v in input_dict.items():
    if len(curr_dict.keys()) < max_limit:
        curr_dict.update({k: v})
    else:
        chunks.append(copy.deepcopy(curr_dict))
        curr_dict = {k: v}
# update last curr_dict
chunks.append(curr_dict)
return chunks

better to provide some explanation with code snippets – Nithin Kumar Biliya Sep 26 '19 at 02:21 — Nithin Kumar Biliya, Sep 26 '19 at 02:21

score 1 · Answer 5 · answered Dec 27 '21 at 16:21

This code works in Python 3.8 and does not use any external modules:

def split_dict(d, n):
    keys = list(d.keys())
    for i in range(0, len(keys), n):
        yield {k: d[k] for k in keys[i: i + n]}


for item in split_dict({i: i for i in range(10)}, 3):
    print(item)

prints this:

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{6: 6, 7: 7, 8: 8}
{9: 9}

... and might even be slightly faster than the (currently) accepted answer of thefourtheye:

from hwcounter import count, count_end


start = count()
for item in chunks({i: i for i in range(100000)}, 3):
    pass
elapsed = count_end() - start
print(f'elapsed cycles: {elapsed}')

start = count()
for item in split_dict({i: i for i in range(100000)}, 3):
    pass
elapsed = count_end() - start
print(f'elapsed cycles: {elapsed}')

prints

elapsed cycles: 145773597
elapsed cycles: 138041191

Why do not you use the Python's 'timeit' module to measure performance? — Nairum, Feb 11 '22 at 13:21

score 0 · Answer 6 · answered Dec 18 '22 at 15:20

Something like the following should work, with only builtins:

>>> adict = {1:'a', 2:'b', 3:'c', 4:'d'}
>>> chunklen = 2
>>> dictlist = list(adict.items())
>>> [ dict(dictlist[i:i + chunklen]) for i in range(0, len(dictlist), chunklen) ]
[{1: 'a', 2: 'b'}, {3: 'c', 4: 'd'}]

This preps the original dictionary into a list of items, but you could possibly could do that in a one-liner.

gies0r · Answer 7 · 2019-05-26T21:33:08.537

-1

import numpy as np
chunk_size = 3
chunked_data = [[k, v] for k, v in d.items()]
chunked_data = np.array_split(chunked_data, chunk_size)

Afterwards you have ndarray which is iterable like this:

for chunk in chunked_data:
    for key, value in chunk:
        print(key)
        print(value)

Which could be re-assigned to a list of dicts using a simple for loop.

edited May 26 '19 at 21:33

answered May 11 '19 at 18:52

gies0r

4,723
4
39
50

2

Perhaps downvoted because it's an obvious overkill to use a numpy ndarray to chunk native dictionaries. The OP expressed the need to not use any external module, explicitly mentioning numpy. – Ignatius Jan 28 '20 at 04:10

How to split dictionary into multiple dictionaries fast

7 Answers7

Linked

Related