0

I have downloaded history prices for 2 stocks from Yahoo Finance, and merged the 2 data frames in order to compute the correlation of their close prices over different period of time (see the attached picture of the merged data frame):

2-day (intraday) 3-day 5-day

One way I am thinking of is to iterate the rows from the bottom of the data frame, and get subset of the 2 columns of Close_x and Close_y in the size of 2/3/5 rows and calculate the correlation, respectively. The calculated correlations will be added as columns to the merged data frame.

I am a novice to Pandas data frame, I think it's against the nature of data frame to iterate each row/column. I was wondering if there's a more efficient way to achieve my goal.

2 stock prices with close prices over a period of time

The color-coded boxes are:

  1. red: correlation over 2 days of close prices
  2. blue: correlation over 3 days ...
  3. green: correlation over 5 days ...
df = pd.DataFrame([[23.02000046, 23.13999939, 24.21999931, 26.70000076, 28.03000069], 
[445.9200134, 446.9700012, 444.0400085, 439.1799927, 439.8599854]], columns = ['Close_x', 'Close_y'])

For the extracted data in the code above, the expected result would be

The correlation of the last 2 rows is 1:

The correlation of the last 3 rows is -0.8867: The correlation of the last 5 rows is -0.9510:

The final output will have the correlation coefficients as new columns. Adding the correlation coefficients as new columns, it will look like this:

    Close_x      Close_y    2D_Corr   3D_Corr    5D_Corr
    23.02000046  445.9200134     ...      ...         ...
    23.13999939  446.9700012     ...       ...         ...
    24.21999931  444.0400085               
    26.70000076  439.1799927
    28.03000069  439.8599854       1      -0.8867    -0.9510
TonyW
  • 18,375
  • 42
  • 110
  • 183
  • 1
    Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Aug 20 '21 at 04:58
  • 1
    You might want to check out [DataFrame.rolling](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html) and [this previous question](https://stackoverflow.com/q/35204460/16327476). – TMBailey Aug 20 '21 at 07:55

1 Answers1

1

As per TM Bailey's comment, you can use rolling:

Close_x = [23.02000046, 23.13999939, 24.21999931, 26.70000076, 28.03000069]
Close_y = [445.9200134, 446.9700012, 444.0400085, 439.1799927, 439.8599854]

s1 = pd.Series(Close_x)
s2 = pd.Series(Close_y)

s1.rolling(5).corr(s2)

output

Daniel Redgate
  • 219
  • 1
  • 6