I'm trying to calculate correlation coefficients of stocks to different variables i.e. x-axis will be time-series daily returns of Stock 1, Stock 2, Stock 3..., y-axis will be things like time-series returns on S&P 500 index, 10 year yields, inflation etc.
Roughly speaking, the data in both CSV's are as below, noting also that there might be some missing date rows (appears in xaxis.csv but not in yaxis.csv, vice versa), as well as blank cells between both files.
xaxis.csv:
Date,Stock1,Stock2,Stock3
4/1/2010,1.01,0.81,0.64
5/1/2010,1.02,0.85,0.63
6/1/2010,1.0,0.83,0.65
yaxis.csv:
Date,STI Index,MASB10Y,USGG10YR Index
4/1/2010,2894.55,2.7,3.79
5/1/2010,2920.28,2.67,3.84
6/1/2010,2930.49,2.7,3.82
To achieve the above, i had in mind to first create 2 dataframes from 2 separate CSV files (1 for each axis' time-series data), assign date as the index, before running the corrwith() function.
However, my code below doesn't work. One of the issues flagged was:
TypeError: unsupported operand type(s) for /: 'str' and 'float'
Has anyone encountered this issue, or have a better way for me to go about this task?
import pandas_datareader.data as web
import pandas as pd
import datetime as dt
import csv
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
import seaborn as sns
df1 = pd.read_csv('xaxis.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format='%d/%m/%Y', dayfirst=True)
df1.set_index('Date', inplace=True)
df1 = df1[((df1.index >= '2010-12-31') & (df1.index <= '2019-12-31'))]
df1_corr = df1.pct_change()
df2 = pd.read_csv('yaxis.csv')
df2['Date'] = pd.to_datetime(df2['Date'], format='%d/%m/%Y', dayfirst=True)
df2.set_index('Date', inplace=True)
df2 = df2[((df2.index >= '2010-12-31') & (df2.index <= '2019-12-31'))]
df2_corr = df2.pct_change()
ax = df1_corr.corrwith(df2_corr, axis=1)
print(ax)
Error:
Traceback (most recent call last):
\pandas\core\ops\array_ops.py", line 149, in na_arithmetic_op
result = expressions.evaluate(op, str_rep, left, right)
\pandas\core\computation\expressions.py", line 208, in evaluate
return _evaluate(op, op_str, a, b)
\pandas\core\computation\expressions.py", line 70, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for /: 'str' and 'float'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Terence Lee/PycharmProjects/stistockprice/Draft4.py", line 21, in <module>
df2_corr = df2.pct_change()
\pandas\core\generic.py", line 10086, in pct_change
rs = data.div(data.shift(periods=periods, freq=freq, axis=axis, **kwargs)) - 1
\pandas\core\ops\__init__.py", line 767, in f
new_data = left._combine_frame(right, pass_op, fill_value)
\pandas\core\frame.py", line 5300, in _combine_frame
new_data = ops.dispatch_to_series(self, other, _arith_op)
\pandas\core\ops\__init__.py", line 419, in dispatch_to_series
new_data = expressions.evaluate(column_op, str_rep, left, right)
\pandas\core\computation\expressions.py", line 208, in evaluate
return _evaluate(op, op_str, a, b)
\pandas\core\computation\expressions.py", line 70, in _evaluate_standard
return op(a, b)
\pandas\core\ops\__init__.py", line 388, in column_op
return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
\pandas\core\ops\__init__.py", line 388, in <dictcomp>
return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
\pandas\core\ops\common.py", line 64, in new_method
return method(self, other)
\pandas\core\ops\__init__.py", line 503, in wrapper
result = arithmetic_op(lvalues, rvalues, op, str_rep)
\pandas\core\ops\array_ops.py", line 197, in arithmetic_op
res_values = na_arithmetic_op(lvalues, rvalues, op, str_rep)
\pandas\core\ops\array_ops.py", line 151, in na_arithmetic_op
result = masked_arith_op(left, right, op)
\pandas\core\ops\array_ops.py", line 94, in masked_arith_op
result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Process finished with exit code 1