I'm new to Python so I've decided to solve some common challenges to improve my knowledge of the language. I learned about numpy and its efficient ndarrays so I attempted the following experiment:
Consider the 2 sum problem (e.g. here) and let's solve it the naive way (it doesn't matter for the purpose of this question). Here's a solution with python's lists:
from itertools import combinations
def twosum1(n_lst):
pairs=list(combinations(n_lst,2))
solutions=[]
for pair in pairs:
if sum(pair)==7: solutions.append(pair)
return(solutions)
Then I created a version using np.arrays expecting it will drastically speed up the calculation:
from itertools import combinations
import numpy as np
def twosum2(n_lst):
pairs=np.array(list(combinations(n_lst,2)),dtype=int)
return pairs[pairs[:,1]+pairs[:,0]==7]
However, after timing the two functions, twosum2 is about 2x slower than twosum1. So I thought that the problem maybe in the dynamical selection of elements, so I've written an exact copy of twosum1 by replacing lists with ndarrays ...
def twosum3(n_lst):
pairs=np.array(list(combinations(n_lst,2)))
solutions=np.empty((0,2))
for pair in pairs:
if np.sum(pair)==7:
solutions=np.append(solutions,[pair],axis=0)
return(solutions)
... and the resulting function was 10x slower than the original!
How is this possible? What I'm I doing wrong here? Clearly, removing loops and replacing lists with ndarrays is not enough to gain speed (contrary to what I learned reading this).
Edit:
- I use %timeit in jupyter to time the functions.
- I take identical benchmarks for all the functions I'm timing.
- The fact that I calculate combinations in the same way in the 3 functions tells me that the slowing down is due to numpy ... but don't see how.