Xarray or Multi Index DataFrame for large scale data analysis

Question

I just found out how to import loads of historical stock prices into an array at http://www.pythonforfinance.net/2017/01/21/investment-portfolio-optimisation-with-python/

I did this using this code

import pandas_datareader as pdr
import pandas as pd
import numpy as np
import fix_yahoo_finance as yf

#yf.pdr_override()

source = 'yahoo'
stocks = ['AAPL', 'AMZN', 'MSFT', 'GOOG', 'GE']
start = '2017-01-01'
end_date = '2017-12-10'

data_array = pdr.DataReader(stocks, source, start,)['Adj Close']

print (data_array)

The issue is that I have a warning

Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a .to_xarray() method to help automate this conversion.

So I need advice, my goal is to save the data to CSV or excel file. Second step will be to take the average of the last x days of prices and divide the current price by the average and finally then to do a percent rank on the stocks for each day. Given the multiple steps, I figured to write it all to excel one step after another and to save myself from having to recalculate all history each day.

QUESTION: Given this ambition, what would be better method: Xarray or MultiIndex on DataFrame?

This question may help: https://stackoverflow.com/questions/42876278/when-to-use-multiindexing-vs-xarray-in-pandas — Tim, Mar 08 '18 at 12:48
Have you figured out an answer to your question? It would help me, if you could answer your own question :) — NeStack, Oct 12 '19 at 14:09

Xarray or Multi Index DataFrame for large scale data analysis

0 Answers0