If I have multiple time series in CSV and want to use Python to compute returns and volatilities, what is the most efficient way? The dataset would be something as below with different files holding several time series.
1) to store the data (i.e. Is It more efficient to load the whole table directly as data frame or load each time series in a dictionary (key is the equity name, values are the prices)
2) to compute returns and volatilities
I know I could use numpy and pandas to do the following:
xls = pd.ExcelFile(filepath) # load the file
df = xls.parse(xls.sheet_names[sheetNo]) # now the file content is in the dataframe
df[‘A_ret’] = np.log(df[‘A’] / df[‘A’].shift(1)) # 1day log return for time series A
df[‘A_volatility’] = pd.rolling_std(df[‘A_ret’],window=252) * np.sqrt(252) # 1y volatility for time series A
But since I have many columns and many files to process, I was wondering if there is a better way. I read that Python can compile some part of the code to use Numba a make the code very fast. But that works fine if I use for loops not sure if I can leverage that in some ways to make the code to compute returns and volatilities faster.