resampling pandas series with numeric index

Question

suppose I have a pandas.Series with index with numeric value type e.g.

pd.Series( [10,20], [1.1, 2.3] )

How do we resample above series with 0.1 interval? look like the .resample func only work on datetime interval?

TomAugspurger · Accepted Answer · 2014-03-04T18:23:08.437

6

That goes by the name of interpolation. You can think for resampling as a special case of interpolation.

In [24]: new_idx = s.index + pd.Index(np.arange(1.1, 2.3, .01))

In [25]: s.reindex(new_idx).interpolate().head()
Out[25]: 
1.10    10.000000
1.11    10.083333
1.12    10.166667
1.13    10.250000
1.14    10.333333
dtype: float64

In [26]: s.reindex(new_idx).interpolate().tail()
Out[26]: 
2.26    19.666667
2.27    19.750000
2.28    19.833333
2.29    19.916667
2.30    20.000000
dtype: float64

We need new_idx to be a union of the original index and the values we want to interpolate, so that the original index isn't dropped.

Have a look at the interpolation methods: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

edited Mar 04 '14 at 18:23

answered Mar 04 '14 at 18:07

TomAugspurger

28,234
8
86
69

I think the 2.3 is dropped when reindexing, hence why the result isn't increasing... – Andy Hayden Mar 04 '14 at 18:20
Nice catch. I should have looked closer. Thoughts on the `reindex().interpolate()` API verses `s.reindex(at=new_idx)`? Originally it was mostly to make `df.interpolate()` easier for to write. I can see arguments either way. – TomAugspurger Mar 04 '14 at 18:25
@AndyHayden and tom: are floats reliable as index values now? – Paul H Mar 04 '14 at 18:25
They are supported with the `Float64Index` now. But there are still some tricky issues so I try to avoid them when possible. – TomAugspurger Mar 04 '14 at 18:29
I think using cut is more stable, was hoping you/someone knew a better way! Floats are fiddly/sensitive as we've just found, they're sometimes useful though... just handle with care. – Andy Hayden Mar 04 '14 at 18:43

score 1 · Answer 2 · answered Mar 04 '14 at 18:41

One option is to use cut to bin this data (much less elegant than a resample but here goes):

In [11]: cat, retbins = pd.cut(s.index, np.arange(1, 3, 0.1), retbins=True)

In [12]: s.index = retbins[cat.labels]

In [13]: s
Out[13]: 
1.0    10
2.2    20
dtype: int64

Say if you wanted to resample with how='sum':

In [14]: s = s.groupby(s.index).sum()

In [15]: s = s.reindex(retbins)

There's a lot of NaNs now, you can now, as Tom suggests, interpolate:

In [16]: s.interpolate()

score 0 · Answer 3 · answered Mar 04 '14 at 17:50

well i don't think you can have a non-integer index, because of float comparison ops. with .1, however, you could do something like:

create a new df = pd.DataFrame(index=range(100, 201)) [the ones will now represent .1]
set the values at 100 (originally 10) and 200 (originally 20) to 1.1 and 2.3
df.fillna(method='pad', inplace=True)

also, it seems like you don't even really need to use the index at all, you just want the gaps between the data...

resampling pandas series with numeric index

3 Answers3

Linked