I believe the suggestion from @schwobaseggl is correct. From here, the complexity of accessing an element is O(1)
for both dict
and list
, this was somewhat replicated in this question, in fact in the previous setting list
was sligthly faster than dict
. I replicated your benchmark, and added other data structures, in full:
- List
- Dictionary
- Set (using pop)
- Deque (from collections)
- Tuple
The code:
import timeit
from collections import deque
class DictSolution:
DATA = {x: x for x in range(1000000)}
def compute(self, x):
return self.DATA[x]
class SetSolution:
DATA = {i for i in range(1000000)}
def compute(self, x):
return self.DATA.pop()
class DequeSolution:
DATA = deque(i for i in range(1000000))
def compute(self, x):
return self.DATA.popleft()
class ListSolution:
DATA = [i for i in range(1000000)]
def compute(self, x):
return self.DATA[x]
class TupleSolution:
DATA = tuple(i for i in range(1000000))
def compute(self, x):
return self.DATA[x]
def test_dict():
sol = DictSolution()
for i in range(1000000):
sol.compute(i)
def test_set():
sol = SetSolution()
for i in range(1000000):
sol.compute(i)
def test_deque():
sol = DequeSolution()
for i in range(1000000):
sol.compute(i)
def test_list():
sol = ListSolution()
for i in range(1000000):
sol.compute(i)
def test_tuple():
sol = TupleSolution()
for i in range(1000000):
sol.compute(i)
def test_pop_list():
sol = PopListSolution()
for i in range(1000000):
sol.compute(i)
des = timeit.timeit(test_deque, number=1)
ss = timeit.timeit(test_set, number=1)
ds = timeit.timeit(test_dict, number=1)
ls = timeit.timeit(test_list, number=1)
ts = timeit.timeit(test_tuple, number=1)
times = [("Dict Solution:", ds), ("Set Solution:", ss), ("Deque Solution:", des), ("List Solution:", ls), ("Tuple Solution:", ts)]
for label, time in sorted(times, key=lambda e: e[1]):
print(label, time)
Output
Tuple Solution: 0.1597294129896909
List Solution: 0.16653884798870422
Dict Solution: 0.17414769899914972
Set Solution: 0.190879073983524
Deque Solution: 0.1914772919844836
I ran the script several times and the results were similar with the tuple solution and list solution alternating the lead. Note that both SetSolution
and DequeSolution
were the slowest. So to answer your questions:
- Both
set
and deque
are slower because you are deleting an element from the list, in the other structures you are only accessing the elements.
- This was partially answered in previous answer, pop does not only returns an element from the data structure it also deletes the element from it. So I will expect that updating a data structure will be slower than accessing only one of the elements of it.
Notes
Although pop
works with set
for the test cases, in general this behaviour is not expected, for instance:
test = {'e' + str(i) for i in range(10)}
while test:
print(test.pop())
Output (set pop)
e5
e8
e6
e0
e1
e3
e7
e4
e9
e2
More on this topic can be found here.
I also benchmarked a solution using list.pop(0)
albeit using a smaller range: 100000
, and fewer candidates (list, tuple and dict). The results were the following:
('List Solution:', 0.018702030181884766)
('Tuple Solution:', 0.021403074264526367)
('Dict Solution:', 0.02230381965637207)
('List Pop Solution', 1.8658080101013184)
The benchmark was ran on the following setup:
Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
16GB
Ubuntu 16.04
Python 3.5.2