6

suppose I have a pandas.Series with index with numeric value type e.g.

pd.Series( [10,20], [1.1, 2.3] )

How do we resample above series with 0.1 interval? look like the .resample func only work on datetime interval?

John
  • 2,107
  • 3
  • 22
  • 39

3 Answers3

6

That goes by the name of interpolation. You can think for resampling as a special case of interpolation.

In [24]: new_idx = s.index + pd.Index(np.arange(1.1, 2.3, .01))

In [25]: s.reindex(new_idx).interpolate().head()
Out[25]: 
1.10    10.000000
1.11    10.083333
1.12    10.166667
1.13    10.250000
1.14    10.333333
dtype: float64

In [26]: s.reindex(new_idx).interpolate().tail()
Out[26]: 
2.26    19.666667
2.27    19.750000
2.28    19.833333
2.29    19.916667
2.30    20.000000
dtype: float64

We need new_idx to be a union of the original index and the values we want to interpolate, so that the original index isn't dropped.

Have a look at the interpolation methods: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

TomAugspurger
  • 28,234
  • 8
  • 86
  • 69
  • I think the 2.3 is dropped when reindexing, hence why the result isn't increasing... – Andy Hayden Mar 04 '14 at 18:20
  • Nice catch. I should have looked closer. Thoughts on the `reindex().interpolate()` API verses `s.reindex(at=new_idx)`? Originally it was mostly to make `df.interpolate()` easier for to write. I can see arguments either way. – TomAugspurger Mar 04 '14 at 18:25
  • @AndyHayden and tom: are floats reliable as index values now? – Paul H Mar 04 '14 at 18:25
  • They are supported with the `Float64Index` now. But there are still some tricky issues so I try to avoid them when possible. – TomAugspurger Mar 04 '14 at 18:29
  • I think using cut is more stable, was hoping you/someone knew a better way! Floats are fiddly/sensitive as we've just found, they're sometimes useful though... just handle with care. – Andy Hayden Mar 04 '14 at 18:43
1

One option is to use cut to bin this data (much less elegant than a resample but here goes):

In [11]: cat, retbins = pd.cut(s.index, np.arange(1, 3, 0.1), retbins=True)

In [12]: s.index = retbins[cat.labels]

In [13]: s
Out[13]: 
1.0    10
2.2    20
dtype: int64

Say if you wanted to resample with how='sum':

In [14]: s = s.groupby(s.index).sum()

In [15]: s = s.reindex(retbins)

There's a lot of NaNs now, you can now, as Tom suggests, interpolate:

In [16]: s.interpolate()
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0

well i don't think you can have a non-integer index, because of float comparison ops. with .1, however, you could do something like:

  • create a new df = pd.DataFrame(index=range(100, 201)) [the ones will now represent .1]
  • set the values at 100 (originally 10) and 200 (originally 20) to 1.1 and 2.3
  • df.fillna(method='pad', inplace=True)

also, it seems like you don't even really need to use the index at all, you just want the gaps between the data...

acushner
  • 9,595
  • 1
  • 34
  • 34