1

I am trying to access n rows of the dataframe and compute mean. The objective is no to use for loop. Because, my df has 30k rows and it may slow it. So, the objective is to use a pandas function to compute n rows mean.

My code:

from scipy import stats 
dfx = pd.DataFrame({'A':[10,20,15,30,1.5,0.6,7,0.8,90,10]}) 
n=2 ## n to cover n samples 
cl_id = dfx.columns.tolist().index('A')  ### cl_id for index number of the column for using in .iloc 
l1=['NaN']*n+[stats.linregress(dfx.iloc[x+1-n:x+1,cl_id].tolist(),[1,2])[0] for x in np.arange(n,len(dfx))]
dfx['slope'] = l1
print(dfx)
      A      slope
0  10.0        NaN
1  20.0        NaN  #stats.linregress([20,10],[1,2])[0] is missing here. Why?
2  15.0       -0.2  #stats.linregress([15,20],[1,2])[0] = 0.2
3  30.0  0.0666667  #stats.linregress([30,15],[1,2])[0] = 0.06667
4   1.5 -0.0350877
5   0.6   -1.11111
6   7.0    0.15625
7   0.8   -0.16129
8  90.0  0.0112108
9  10.0    -0.0125

Everything working fine. Is there a pythonic way of doing it? Like using rolling() function etc.

Mainland
  • 4,110
  • 3
  • 25
  • 56
  • one of your questions is why your new series starts with two `NaN`s, and the answer is because you are prepending those two values with `l1=['NaN']*n+[...` – RichieV Sep 04 '20 at 15:06
  • Does this answer your question? [Pandas - Rolling slope calculation](https://stackoverflow.com/questions/42138357/pandas-rolling-slope-calculation) – RichieV Sep 04 '20 at 15:09

1 Answers1

1
n = 2
dfx.A.rolling(n).apply(lambda x: stats.linregress(x, x.index+1)[0], raw=False)

Output:

0         NaN
1    0.100000
2   -0.200000
3    0.066667
4   -0.035088
5   -1.111111
6    0.156250
7   -0.161290
8    0.011211
9   -0.012500
Mohsin hasan
  • 827
  • 5
  • 10