Efficient way to convert a dict into a flat data structure (list or tuple)

Question

I implemented a breadth-first search for a board game. Here I use a dict to reflect duplicate board configurations on each level. Currently, the overall search takes nearly all the RAM I have (16GB) for one start configuration. I plan to integrate intersection checks for different start configurations. So I need read access to the configurations I found, and the dict for one level doesn't change if the level is finished.

That's why I plan to convert the dict into a flat data structure (list or tuple) with keys at [2n] position and values at [2n+1] position before I evaluate the next level.

The problem is to find a fast conversion from {1: 2, 3: 4} to [1, 2, 3, 4] for dictwith more than 10**8 items.

I found sum(dict.items(), ()) from a comment by Natim to another question, which worked, but which is too slow (it seemingly stops working for dicts with more than 10**6 items).

I'mahdi · Answer 1 · 2021-09-24T17:21:10.307

You can try this:

dct = {1:2, 3:4, 5:6, 7:8}

out = [None] * 2*len(dct)

for idx, (out[2*idx],out[2*idx+1]) in enumerate(dct.items()):
    pass

print(out)

Output:

[1, 2, 3, 4, 5, 6, 7, 8]

Check Runtime with dictionary that size is 50_000_000: (on colab)

from timeit import timeit
import operator, functools
from itertools import chain

def approach1(dct):
    li = []
    for k, v in dct.items():
        li.extend([k,v])
    return li

def approach2(dct):
    out = [None] * 2*len(dct)
    for idx, (out[2*idx],out[2*idx+1]) in enumerate(dct.items()):
        pass
    return (out)

def approach3(dct):
    return functools.reduce(operator.iconcat, dct.items(), [])

def approach4(dct):
    return list(chain.from_iterable(dct.items()))

def approach5(dct):
    return [i for t in dct.items() for i in t]
    
funcs = approach1, approach2, approach3, approach4, approach5
dct = {i:i for i in range(50_000_000)}

for _ in range(3):
    for func in funcs:
        t = timeit(lambda: func(dct), number=1)
        print('%.3f s ' % t, func.__name__)
    print()

Output:

8.825 s  approach1
13.243 s  approach2
4.506 s  approach3
3.809 s  approach4
7.881 s  approach5

8.391 s  approach1
13.159 s  approach2
4.487 s  approach3
3.854 s  approach4
7.946 s  approach5

8.391 s  approach1
13.197 s  approach2
4.448 s  approach3
3.681 s  approach4
7.904 s  approach5

Check Runtime with different size of dictionary: (on colab)

from timeit import timeit
import operator, functools
from itertools import chain
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def app_extend(dct):
    li = []
    for k, v in dct.items():
        li.extend([k,v])
    return li
def app_enumerate(dct):
    out = [None] * 2*len(dct)
    for idx, (out[2*idx],out[2*idx+1]) in enumerate(dct.items()):
        pass
    return (out)
def app_functools(dct):
    return functools.reduce(operator.iconcat, dct.items(), [])
def app_chain(dct):
    return list(chain.from_iterable(dct.items()))
def app_for(dct):
    return [i for t in dct.items() for i in t]
funcs = app_extend, app_enumerate, app_functools, app_chain, app_for
dct_rslt = {}
for dct_size in [100_000, 250_000, 500_000, 1_000_000, 2_500_000, 5_000_000, 10_000_000, 25_000_000, 50_000_000]:
    dct = {i:i for i in range(dct_size)}
    dct_rslt[str(dct_size)] = {func.__name__ : timeit(lambda: func(dct), number=1) for func in funcs}
df = pd.DataFrame(dct_rslt).T
fig, ax = plt.subplots()
fig.set_size_inches(12, 9)
sns.lineplot(data=df)
plt.xlabel('Dictionary Size')
plt.ylabel('Time(sec)')
plt.show()

... please don't forget that it has to be fast for dicts with millions of items, maybe not easy to extrapolate ... — Wolf, Sep 24 '21 at 11:15
I changed the integer literal within `range` for better readability. — Wolf, Sep 24 '21 at 12:21
@Wolf edited answer with runtime on colab, my PC was slow ;) — I'mahdi, Sep 24 '21 at 12:32

Wolf · Answer 2 · 2021-09-24T12:49:26.177

0

Appending and extending a list is quite efficient in python:

def dict_to_list(d):
    li = []
    for k, v in d.items():
        li.extend([k,v])
    return li

Therefore, the above apparently naive function beats the very compact and somehow also elegant expression list(sum(d.items(), ())) in terms of performance.

edited Sep 24 '21 at 12:49

answered Sep 24 '21 at 10:05

Wolf

9,679
7
62
108

2

``list(chain.from_iterable(a.items()))`` – jizhihaoSAMA Sep 24 '21 at 10:11
It's a bit sad that this first reaction didn't start as an answer. I think it's high likely that I keep the mantra *chain from iterable* in mind instead of the slightly faster functional recipe with more ingredients. – Wolf Sep 24 '21 at 11:48

score 0 · Answer 3 · answered Sep 24 '21 at 10:10

0

You can use a list comprehension over the items of the dict:

d = {1: 2, 3: 4}
print([i for t in d.items() for i in t])

This outputs:

[1, 2, 3, 4]

answered Sep 24 '21 at 10:10

blhsing

91,368
6
71
106

basically the same speed as the "naive function" – Wolf Sep 24 '21 at 10:44

U13-Forward · Accepted Answer · 2021-09-24T11:36:19.523

0

Use itertools function chain and classmethod alternative constructor from_iterable:

>>> from itertools import chain
>>> list(chain.from_iterable(dct.items()))
[1, 2, 3, 4]
>>>

Or with operator.iconcat and functools.reduce:

>>> import operator, functools
>>> functools.reduce(operator.iconcat, dct.items(), [])
[1, 2, 3, 4]
>>>

edited Sep 24 '21 at 11:36

answered Sep 24 '21 at 10:16

U13-Forward

69,221
14
89
114

Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237450/discussion-between-wolf-and-u12-forward). – Wolf Sep 24 '21 at 11:02

Efficient way to convert a dict into a flat data structure (list or tuple)

4 Answers4