In Python I have data that looks like this:
data = pd.DataFrame({'currency':['EUR', 'EUR', 'EUR', 'EUR', 'USD', 'USD', 'USD', 'USD'],
'tenor':[1, 2, 5, 10, 1, 2, 5, 10],
'value':[10, 20, np.nan, 100, 1, 2, 3, np.nan]})
I want to group by currency and linearly interpolate NaNs based on tenor, i.e. I want to achieve
data.index = data['tenor']
data['value'] = data.groupby('currency')['value'].apply(lambda x: x.interpolate('values'))
The issue is that interpolate in pandas is very slow. I have several thousands groups within groupby and the entire dataframe has 10 million rows.
Is there a fast way to do interpolation over groups? I tried numpy
result = data.groupby('currency')[['tenor', 'value']].apply(lambda x: list(fun(x['tenor'].values, x['value'].values)))
where
def fun(x, y):
isNaN = np.isnan(y)
return np.interp(x, x[~isNaN], y[~isNaN])
which is faster but not so much.
Can you recommend the fastest within-group interpolation in Python?