I am trying to do a df.apply on date objects but it's too too slow!!
My prun output gives....
ncalls tottime percall cumtime percall filename:lineno(function)
1999 14.563 0.007 14.563 0.007 {pandas.tslib.array_to_timedelta64}
13998 0.103 0.000 15.221 0.001 series.py:126(__init__)
9999 0.093 0.000 0.093 0.000 {method 'reduce' of 'numpy.ufunc' objects}
272012 0.093 0.000 0.125 0.000 {isinstance}
5997 0.089 0.000 0.196 0.000 common.py:199(_isnull_ndarraylike)
So basically it's 14 seconds for a 2000 length array. My actual array size is > 100,000 which translates to a run time of > 15 minutes or maybe more.
It's stupid of pandas to call this function "pandas.tslib.array_to_timedelta64" which is the bottleneck? I really don't understand why this function call is necessary??? Both the operators in subtraction are of same data types. I explicity converted them beforehand using pd.to_datetime() method. And no this conversion time is not included in this calculation.
So in all you can understand my frustration at this pathetic code!!!
actual code looks like this
df = pd.DataFrame(bet_endtimes)
def testing():
close_indices = df.apply(lambda x: np.argmin(np.abs(currentdata['date'] - x[0])),axis=1)
print close_indices
%prun testing()