It is known that NumPy arrays can also be used to store and process any arbitrary Python objects through dtype = np.object_
.
So I decided to measure NumPy usage speed compared to plain python. Also as I mentioned in my question I also want to solve the case when indexes is numpy array of integers.
Next code measures different cases, whether we need to convert or not source lists to numpy arrays and whether result should be converted too.
Try it online!
import string
from timeit import timeit
import numpy as np
np.random.seed(0)
letters = np.array(list(string.ascii_letters), dtype = np.object_)
nl = letters[np.random.randint(0, len(letters), size = (10 ** 6,))]
l = nl.tolist()
ni = np.random.permutation(np.arange(nl.size, dtype = np.int64))
i = ni.tolist()
pyt = timeit(lambda: [l[si] for si in i], number = 10)
print('python:', round(pyt, 3), flush = True)
for l_from_list in [True, False]:
for i_from_list in [True, False]:
for l_to_list in [True, False]:
def Do():
cl = np.array(l, dtype = np.object_) if l_from_list else nl
ci = np.array(i, dtype = np.int64) if i_from_list else ni
res = cl[ci]
res = res.tolist() if l_to_list else res
return res
ct = timeit(lambda: Do(), number = 10)
print(
'numpy:', 'l_from_list', l_from_list, 'i_from_list', i_from_list, 'l_to_list', l_to_list,
'time', round(ct, 3), 'speedup', round(pyt / ct, 2), flush = True
)
outputs:
python: 2.279
numpy: l_from_list True i_from_list True l_to_list True time 2.924 speedup 0.78
numpy: l_from_list True i_from_list True l_to_list False time 2.805 speedup 0.81
numpy: l_from_list True i_from_list False l_to_list True time 1.457 speedup 1.56
numpy: l_from_list True i_from_list False l_to_list False time 1.312 speedup 1.74
numpy: l_from_list False i_from_list True l_to_list True time 2.352 speedup 0.97
numpy: l_from_list False i_from_list True l_to_list False time 2.209 speedup 1.03
numpy: l_from_list False i_from_list False l_to_list True time 0.894 speedup 2.55
numpy: l_from_list False i_from_list False l_to_list False time 0.75 speedup 3.04
So we can see that if we store all lists as numpy arrays then we gain 3x
speedup! But if only indexes is a numpy array then we get speedup of just 1.56x
which is also very good. In the case when everything has to be converted from lists there and back, then we gain speedup of 0.78x
, meaning we slow down, hence if we work with lists only than indexing through numpy is not helpful.