I have downloaded history prices for 2 stocks from Yahoo Finance, and merged the 2 data frames in order to compute the correlation of their close prices over different period of time (see the attached picture of the merged data frame):
2-day (intraday) 3-day 5-day
One way I am thinking of is to iterate the rows from the bottom of the data frame, and get subset of the 2 columns of Close_x and Close_y in the size of 2/3/5 rows and calculate the correlation, respectively. The calculated correlations will be added as columns to the merged data frame.
I am a novice to Pandas data frame, I think it's against the nature of data frame to iterate each row/column. I was wondering if there's a more efficient way to achieve my goal.
The color-coded boxes are:
- red: correlation over 2 days of close prices
- blue: correlation over 3 days ...
- green: correlation over 5 days ...
df = pd.DataFrame([[23.02000046, 23.13999939, 24.21999931, 26.70000076, 28.03000069],
[445.9200134, 446.9700012, 444.0400085, 439.1799927, 439.8599854]], columns = ['Close_x', 'Close_y'])
For the extracted data in the code above, the expected result would be
The correlation of the last 2 rows is 1:
The correlation of the last 3 rows is -0.8867: The correlation of the last 5 rows is -0.9510:
The final output will have the correlation coefficients as new columns. Adding the correlation coefficients as new columns, it will look like this:
Close_x Close_y 2D_Corr 3D_Corr 5D_Corr
23.02000046 445.9200134 ... ... ...
23.13999939 446.9700012 ... ... ...
24.21999931 444.0400085
26.70000076 439.1799927
28.03000069 439.8599854 1 -0.8867 -0.9510