0

Is there any function in pandas to simulate excel formula like '=sum($A$1:A10'(for 10th row), i.e. the formula should take rolling data from 1st row.

Pandas rolling function needs a integer value as window argument.

Chandan Kumar
  • 127
  • 1
  • 2
  • 8
  • Please check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Sep 21 '16 at 11:55
  • Instead of the 30 argument for rolling window for rolling, is there any way to make it dynamic(row number). rolling(30).apply(func=np.percentile,args=(0.90,)). Like the excel $A$1:current cell formulae? – Chandan Kumar Sep 21 '16 at 12:00
  • I am not sure, but it looks like need loop. – jezrael Sep 21 '16 at 12:04

1 Answers1

2

The equivalent of =SUM($A$1:A1) in pandas is .expanding().sum() (requires pandas 0.18.0):

ser = pd.Series([1, 2, 3, 4])

ser
Out[3]: 
0    1
1    2
2    3
3    4
dtype: int64

ser.expanding().sum()
Out[4]: 
0     1.0
1     3.0
2     6.0
3    10.0

You can also apply a generic function via apply:

ser.expanding().apply(lambda x: np.percentile(x, 90))
Out: 
0    1.0
1    1.9
2    2.8
3    3.7
dtype: float64

Or directly with quantile:

ser.expanding().quantile(0.9)
Out[15]: 
0    1.0
1    1.0
2    2.0
3    3.0
dtype: float64

Note that 90th percentile is equal to 0.9th quantile. However, Series.quantile and Series.expanding.quantile are returning different results which is probably a bug.

np.percentile returns the same results as Excel's PERCENTILE.INC. For PERCENTILE.EXC, I've previously wrote a small function here.

Community
  • 1
  • 1
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thanks..this looks like exactly what is needed but the percentile values dont match with excel values(can see that they are not correct even visually). percentile(0.9) should be 1 1.93 2.86 3.79 – Chandan Kumar Sep 21 '16 at 12:12
  • @ChandanKumar For 90th percentile you need to pass 90 as the parameter. However, Series.expanding.quantile seems buggy so it might be better to use numpy version right now. – ayhan Sep 21 '16 at 18:37