0

I'm trying to calculate correlation coefficients of stocks to different variables i.e. x-axis will be time-series daily returns of Stock 1, Stock 2, Stock 3..., y-axis will be things like time-series returns on S&P 500 index, 10 year yields, inflation etc.

Roughly speaking, the data in both CSV's are as below, noting also that there might be some missing date rows (appears in xaxis.csv but not in yaxis.csv, vice versa), as well as blank cells between both files.

xaxis.csv:
Date,Stock1,Stock2,Stock3
4/1/2010,1.01,0.81,0.64
5/1/2010,1.02,0.85,0.63
6/1/2010,1.0,0.83,0.65


yaxis.csv:
Date,STI Index,MASB10Y,USGG10YR Index
4/1/2010,2894.55,2.7,3.79
5/1/2010,2920.28,2.67,3.84
6/1/2010,2930.49,2.7,3.82

To achieve the above, i had in mind to first create 2 dataframes from 2 separate CSV files (1 for each axis' time-series data), assign date as the index, before running the corrwith() function.

However, my code below doesn't work. One of the issues flagged was:

TypeError: unsupported operand type(s) for /: 'str' and 'float'

Has anyone encountered this issue, or have a better way for me to go about this task?

import pandas_datareader.data as web
import pandas as pd
import datetime as dt
import csv
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
import seaborn as sns

df1 = pd.read_csv('xaxis.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format='%d/%m/%Y', dayfirst=True)
df1.set_index('Date', inplace=True)
df1 = df1[((df1.index >= '2010-12-31') & (df1.index <= '2019-12-31'))]
df1_corr = df1.pct_change()


df2 = pd.read_csv('yaxis.csv')
df2['Date'] = pd.to_datetime(df2['Date'], format='%d/%m/%Y', dayfirst=True)
df2.set_index('Date', inplace=True)
df2 = df2[((df2.index >= '2010-12-31') & (df2.index <= '2019-12-31'))]
df2_corr = df2.pct_change()

ax = df1_corr.corrwith(df2_corr, axis=1)

print(ax)

Error:

Traceback (most recent call last):
  \pandas\core\ops\array_ops.py", line 149, in na_arithmetic_op
    result = expressions.evaluate(op, str_rep, left, right)
  \pandas\core\computation\expressions.py", line 208, in evaluate
    return _evaluate(op, op_str, a, b)
  \pandas\core\computation\expressions.py", line 70, in _evaluate_standard
    return op(a, b)
TypeError: unsupported operand type(s) for /: 'str' and 'float'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Terence Lee/PycharmProjects/stistockprice/Draft4.py", line 21, in <module>
    df2_corr = df2.pct_change()
  \pandas\core\generic.py", line 10086, in pct_change
    rs = data.div(data.shift(periods=periods, freq=freq, axis=axis, **kwargs)) - 1
  \pandas\core\ops\__init__.py", line 767, in f
    new_data = left._combine_frame(right, pass_op, fill_value)
  \pandas\core\frame.py", line 5300, in _combine_frame
    new_data = ops.dispatch_to_series(self, other, _arith_op)
  \pandas\core\ops\__init__.py", line 419, in dispatch_to_series
    new_data = expressions.evaluate(column_op, str_rep, left, right)
  \pandas\core\computation\expressions.py", line 208, in evaluate
    return _evaluate(op, op_str, a, b)
  \pandas\core\computation\expressions.py", line 70, in _evaluate_standard
    return op(a, b)
  \pandas\core\ops\__init__.py", line 388, in column_op
    return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
  \pandas\core\ops\__init__.py", line 388, in <dictcomp>
    return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
  \pandas\core\ops\common.py", line 64, in new_method
    return method(self, other)
  \pandas\core\ops\__init__.py", line 503, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  \pandas\core\ops\array_ops.py", line 197, in arithmetic_op
    res_values = na_arithmetic_op(lvalues, rvalues, op, str_rep)
  \pandas\core\ops\array_ops.py", line 151, in na_arithmetic_op
    result = masked_arith_op(left, right, op)
  \pandas\core\ops\array_ops.py", line 94, in masked_arith_op
    result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Process finished with exit code 1
  • What is your full traceback error (i.e. what line does is say the error is occurring on?) also what does the input data look like? can you provide a sample of the .csv data? – Phillyclause89 May 02 '20 at 04:54
  • @Phillyclause89 Thanks for taking a look, I've edited my question above (i've also manually shortened some of the path names from the traceback error) - Hope this helps to get the message across. Sorry if it appears messy as i'm still very new at python :) – Terence Lee May 02 '20 at 06:04
  • How working `df1_corr = df1.astype(float).pct_change()` and `df2_corr = df2.astype(float).pct_change()` ? – jezrael May 02 '20 at 06:07
  • @jezrael Thanks - think the error i'd get is: ValueError: could not convert string to float: ' 3,190.04 ' – Terence Lee May 02 '20 at 06:25
  • @TerenceLee - OK, then remove it and change `df1 = pd.read_csv('xaxis.csv')` to `df1 = pd.read_csv('xaxis.csv', thousands=',')`, same way for `df2` – jezrael May 02 '20 at 06:27
  • @jezrael Ok there's no error in running, but it only prints dates in one column, with NaN in another (i thought it would've been a 3x3 grid of correlation values). – Terence Lee May 02 '20 at 06:42
  • @TerenceLee - Hard question, I test your data and `pct_change` return empty df, I guess some data related problem. – jezrael May 02 '20 at 06:53

0 Answers0