Python fastest group interpolation

Asked Nov 26 '18 at 21:00

Active Jul 08 '19 at 18:33

Viewed 874 times

In Python I have data that looks like this:

data = pd.DataFrame({'currency':['EUR', 'EUR', 'EUR', 'EUR', 'USD', 'USD', 'USD', 'USD'], 
                   'tenor':[1, 2, 5, 10, 1, 2, 5, 10],
                   'value':[10, 20, np.nan, 100, 1, 2, 3, np.nan]})

I want to group by currency and linearly interpolate NaNs based on tenor, i.e. I want to achieve

data.index = data['tenor']
data['value'] = data.groupby('currency')['value'].apply(lambda x: x.interpolate('values'))

The issue is that interpolate in pandas is very slow. I have several thousands groups within groupby and the entire dataframe has 10 million rows.

Is there a fast way to do interpolation over groups? I tried numpy

result = data.groupby('currency')[['tenor', 'value']].apply(lambda x: list(fun(x['tenor'].values, x['value'].values)))

where

def fun(x, y):
    isNaN = np.isnan(y)
    return np.interp(x, x[~isNaN], y[~isNaN])

which is faster but not so much.

Can you recommend the fastest within-group interpolation in Python?

edited Jul 08 '19 at 18:33

kevins_1

1,268
2
9
27

asked Nov 26 '18 at 21:00

user2743931

1

https://stackoverflow.com/questions/37057187/pandas-interpolate-within-a-groupby – MisterMonk Nov 26 '18 at 21:07
I've seen this post already. My numpy implementation is faster than any of the proposed methods, yet still not fast enough. Any better ideas? – user2743931 Nov 26 '18 at 21:28
Writing it in c/c++ ;) No i have always the same problem. Did you check with line_profiler where the problem is coming from. You also can use dask - https://dask.org/ – MisterMonk Nov 26 '18 at 21:37
Can we assume that `data['tenor']` is sorted within each group? – Michael Nov 27 '18 at 03:00
@Michael yes, sorted – user2743931 Nov 29 '18 at 18:49

Python fastest group interpolation

0 Answers0