1

In Python I have data that looks like this:

data = pd.DataFrame({'currency':['EUR', 'EUR', 'EUR', 'EUR', 'USD', 'USD', 'USD', 'USD'], 
                   'tenor':[1, 2, 5, 10, 1, 2, 5, 10],
                   'value':[10, 20, np.nan, 100, 1, 2, 3, np.nan]})

I want to group by currency and linearly interpolate NaNs based on tenor, i.e. I want to achieve

data.index = data['tenor']
data['value'] = data.groupby('currency')['value'].apply(lambda x: x.interpolate('values'))

The issue is that interpolate in pandas is very slow. I have several thousands groups within groupby and the entire dataframe has 10 million rows.

Is there a fast way to do interpolation over groups? I tried numpy

result = data.groupby('currency')[['tenor', 'value']].apply(lambda x: list(fun(x['tenor'].values, x['value'].values)))

where

def fun(x, y):
    isNaN = np.isnan(y)
    return np.interp(x, x[~isNaN], y[~isNaN])

which is faster but not so much.

Can you recommend the fastest within-group interpolation in Python?

kevins_1
  • 1,268
  • 2
  • 9
  • 27
user2743931
  • 174
  • 9

0 Answers0