Looping through pandas dataframe for speed

Question

I'm trying to understand the fastest way to loop through in pandas. I read in many places that itertuples is much better than just regularly looping through data, and the best is apply. If this is the case why do regular loops come out the fastest? Maybe I'm not understanding the results, what does 10 loops, best of 3 mean?

%%timeit
xlist= []
for row in toMood.itertuples():
    xlist.append(row[1] + 1)
1 loop, best of 3: 266 ms per loop
In [54]:


%%timeit
zlist = []
for row in toMood['user_id']:
    zlist.append(row + 1)
10 loops, best of 3: 83 ms per loop
In [56]:

%%timeit
tlist = toMood['user_id'].apply(lambda x: x+1)
10 loops, best of 3: 138 ms per loop

I don't know what the point of this is, but you can just do `toMood['user_id'] +=1` which is a vectorised method — EdChum, Aug 01 '16 at 14:53
That's not a fair comparison. For the *regular* loop, you are slicing a series first and looping over the series. For an equivalent of itertuples you need to iterate over the dataframe. And these are very small to reach a conclusion. Those timings include many things. In order to isolate the iterations you need a big dataframe in my opinion. — ayhan, Aug 01 '16 at 14:57
For a general comparison of loop and loop-like methods, I usually refer to [this answer](http://stackoverflow.com/a/24871316/5276797). — IanS, Aug 01 '16 at 16:29
I have never understood what 'best of 3' means either. Perhaps average of the best 3 iterations? Maybe @EdChum can confirm... — IanS, Aug 01 '16 at 16:31
@IanS indeed this is correct: https://docs.python.org/2/library/timeit.html basically there is a threshold with regards to the number of loops it will perform, if it's too long then it will only do a single loop, then it will do 3 and it it's really fast then 100/1000 loops — EdChum, Aug 01 '16 at 16:44

Looping through pandas dataframe for speed

0 Answers0