To answer your question about dict
vs list
you'd have to give the exact information about the number of elements, the number of missing elements etc, so tha we could estimate exactly the memory usage of the two data structure and try to predict and/or check their performance.
In general a dict
of N
key-value pairs requires quite a bit more memory than a list
with N
values:
- The
dict
must keep track of the keys
- The
dict
is never more than 2/3 full. When this happens the memory allocated is doubled (this is required to have O(1) amortized time operations on the dict
).
However there is an alternative to these data structure which should provide very good performance: blist
. The blist
package provide an interface that matches that of list
, only it is implemented using B-trees. It can efficiently handle sparse lists. Most operations take either O(1)
or O(log n)
time, so they are quite efficient.
For example you could first create a sparse blist
doing:
from blist import blist
seq = blist([None])
seq *= 2**30 # create a 2**30 element blist. Instantaneous!
And then you can set only the indices that have a value:
for i, value in zip(indices, values):
seq[i] = value
The full documentation is here.
Note that blist
provides other efficient operations such as:
- Concatenating two
blist
s take O(log n)
time
- Taking an
[i:j]
slice takes O(log n)
time
- Inserting an element at a given index takes
O(log n)
operations
- Popping an element (from every position) takes
O(log n)
operations
Since you gave some numbers, here's how they compare to dict
s:
>>> from blist import blist
>>> b = blist([None])
>>> b *= 5000
>>> for i in range(100):b[i] = i
...
>>> b.__sizeof__()
2660
>>> d = dict()
>>> for i in range(100):d[i] = i
...
>>> d.__sizeof__()
6216
>>> b = blist([None])
>>> b *= 750
>>> for i in range(30):b[i] = i
...
>>> b.__sizeof__()
1580
>>> d = dict()
>>> for i in range(30):d[i] = i
...
>>> d.__sizeof__()
1608
In both cases a blist
takes less memory (in the first case it takes 1/3 of the memory of the equivalent dict
). Note that the memory taken by a blist
also depends on whether or not the indices are contiguous (contiguous is better). However even using random indices:
>>> b = blist([None])
>>> b *= 5000
>>> import random
>>> for i in range(100):b[random.randint(0, 4999)] = i
...
>>> b.__sizeof__()
2916
It's still much better than the dict
.
Even lookup times are better:
In [1]: from blist import blist
...: import random
...:
In [2]: b = blist([None])
In [3]: b *= 5000
In [4]: for i in range(100):b[random.randint(0, 4999)] = i
In [5]: %timeit b[0]
10000000 loops, best of 3: 50.7 ns per loop
In [6]: d = dict()
In [7]: for i in range(100):d[random.randint(0, 4999)] = i
In [10]: %timeit d[1024] # 1024 is an existing key in this dictionary
10000000 loops, best of 3: 70.7 ns per loop
In [11]: %timeit b[1024]
10000000 loops, best of 3: 50.7 ns per loop
Note that a list
takes about 47 ns
to lookup an index on my machine, so blist
is really really close to a list
in terms of lookup performance on small lists as what you have.