I need to use numpy functions to replace all Pandas functions, but the Pandas package did not explain well how pd.autocorr()
is implemented.
import numpy as np
import pandas as pd
df = pd.DataFrame.from_dict({'A': np.random.random(20)})
x = df.rolling(5).apply(lambda x: x.autocorr(), raw=True).dropna()
y = []
for i in range(15):
y.append( np.corrcoeff(df['A'][i:i+5],df['A'][i+1:i+6])[0,1] )
# np.correlate(df['A'][i:i+5]-df['A'][i:i+5].mean(),df['A'][(1+i):(6+i)]-df['A'][(1+i):(6+i)].mean(),'valid')[0]
# np.correlate(df['A'][i:i+5]-df['A'][i:i+5].mean(),np.flip(df['A'][(1+i):(6+i)])-df['A'][(1+i):(6+i)].mean(),'valid')[0]
The pd.autocorr()
result is quite different from that of np.corrcoef()
(I treid np.correlate()
as well).
Is there any way I can use numpy only functions to achieve the same reulst as pd.autocorr()
?
----------------- Example result added ----------------
df['A'] = [0.5314742325906894, 0.7424912257400176, 0.2895649008872213, 0.16967710120380175, 0.5157732179121193, 0.8733423106397956, 0.585705172096987, 0.1387299202733231, 0.18540514459343538, 0.13913104211564564, 0.736937228263526, 0.20944078980434988, 0.2826810751427198, 0.15055686873748197, 0.4159491505728884, 0.07600226975854041, 0.15279939462562298, 0.1405723553409276, 0.8372449734938123, 0.3314986851097367]
x = [0.010637545587524432, 0.03594106077726333, 0.40104877005219836, -0.009106549297130558, 0.4008385963492408, 0.7794761931857483, -0.4182779136016351, -0.2962696925038811, -0.4083361773384266, -0.5244693987698964, -0.5063605533618415, -0.9496936641021706, -0.5303040575891907, -0.42881675192105184, -0.3371366910961831, -0.036231529863559424]
y = [0.11823200733266746, 0.16166841984627847, 0.2033980627120384, 0.2861039403548347, 0.5239653859040245, 0.1602079943122044, -0.3920837265006942, -0.28176746883177917, -0.3604612671108854, -0.5347077109231272, -0.4702461092101919, -0.5287673078857449, -0.4501452367448014, -0.3538574959825232, -0.10013342594129321]