0

If I have multiple time series in CSV and want to use Python to compute returns and volatilities, what is the most efficient way? The dataset would be something as below with different files holding several time series.

enter image description here

1) to store the data (i.e. Is It more efficient to load the whole table directly as data frame or load each time series in a dictionary (key is the equity name, values are the prices)

2) to compute returns and volatilities

I know I could use numpy and pandas to do the following:

xls = pd.ExcelFile(filepath) # load the file
df = xls.parse(xls.sheet_names[sheetNo]) # now the file content is in the dataframe
df[‘A_ret’] = np.log(df[‘A’] / df[‘A’].shift(1)) # 1day log return for time series A
df[‘A_volatility’] = pd.rolling_std(df[‘A_ret’],window=252) * np.sqrt(252) # 1y volatility for time series A

But since I have many columns and many files to process, I was wondering if there is a better way. I read that Python can compile some part of the code to use Numba a make the code very fast. But that works fine if I use for loops not sure if I can leverage that in some ways to make the code to compute returns and volatilities faster.

opt
  • 477
  • 1
  • 10
  • 25
  • 1
    It would be much faster to load the entire csv as a dataframe rather than processing it all as a dictionary. Even with many files you can use a for loop and dynamically create a dataframe for each csv file, or concatenate all of the csv data into one large dataframe. And you can do column wise calculations of both returns and volatility quickly as you have shown. – Edeki Okoh Apr 02 '19 at 20:30
  • 2
    Have a look at this: https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – Erfan Apr 02 '19 at 20:34

0 Answers0