In my df I have a multiindex like this:
df.index.names
FrozenList([u'Ticker', u'Date'])
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 189667 entries, (AAPL, 1992-08-31 00:00:00) to (^DJI, 2017-08-31 00:00:00)
On a single index df I would do:
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.2, shuffle=False)
However, this does not work with multiindex, it just cuts the rows in 80/20.
Note: I do not want random sampling, just splitting 80/20 based on date.
Any clues?
Edit:
This is how I fetch the data in question (apart from many more than two tickers):
import pandas as pd
import pandas_datareader.data as web
tickers = ['AAPL', 'AXP']
def get_data(tickers):
''' Dowloads daily O/H/L/C data for all symbols'''
def data(ticker):
return web.DataReader(ticker, 'yahoo')
datas = map(data, tickers)
return pd.concat(datas, keys=tickers, names=['Ticker', 'Date'])
stock_data = get_data(tickers)