Edit: I figured it out. I need to sort_index before I can do any slicing...
First some context
I have a dataframe like this (note that there might be more tickers than SPY, I just took the one ticker here...):
open high low close adj_close volume ticker date SPY 1993-01-29 43.9688 43.9688 43.7500 43.9375 25.7297 1003200 1993-02-01 43.9688 44.2500 43.9688 44.2500 25.9127 480500 1993-02-02 44.2188 44.3750 44.1250 44.3438 25.9676 201300 1993-02-03 44.4062 44.8438 44.3750 44.8125 26.2421 529400 1993-02-04 44.9688 45.0938 44.8750 45.0000 26.3519 531500 ... ... ... ... ... ... ... 2021-10-18 443.9700 447.5500 443.2700 447.1900 447.1900 62213228 2021-10-19 448.9200 450.7100 448.2700 450.6400 450.6400 46996827 2021-10-20 451.1300 452.7320 451.0100 452.4100 452.4100 49571569 2021-10-21 451.7700 453.8290 451.3100 453.5900 453.5900 41305438 2021-10-22 453.1300 454.6700 451.0500 453.1200 453.1200 58845088
And if say if I want to get SPY's price for 2015-01, I think I should be able to do this:
df.loc[("SPY", "2015-01")]
This will nicely preserve the shape of the df, but it turns out I cannot, and then I tried to use the following syntax, which if worked the dataframe will become a df with single index.
df.loc["SPY"]["2015-01"]
But unfortunately, both solution does not work. With both methods giving me KeyError
. I have verified the second level of the index did contain date
object, not pure strings.
In [24]: df.index
Out[24]:
MultiIndex([('SPY', '1993-01-29'),
('SPY', '1993-02-01'),
('SPY', '1993-02-02'),
('SPY', '1993-02-03'),
('SPY', '1993-02-04'),
('SPY', '1993-02-05'),
('SPY', '1993-02-08'),
('SPY', '1993-02-09'),
('SPY', '1993-02-10'),
('SPY', '1993-02-11'),
...
('SPY', '2021-10-11'),
('SPY', '2021-10-12'),
('SPY', '2021-10-13'),
('SPY', '2021-10-14'),
('SPY', '2021-10-15'),
('SPY', '2021-10-18'),
('SPY', '2021-10-19'),
('SPY', '2021-10-20'),
('SPY', '2021-10-21'),
('SPY', '2021-10-22')],
names=['ticker', 'date'], length=7237)
In [25]: df.index.get_level_values(1)
Out[25]:
Index(['1993-01-29', '1993-02-01', '1993-02-02', '1993-02-03', '1993-02-04',
'1993-02-05', '1993-02-08', '1993-02-09', '1993-02-10', '1993-02-11',
...
'2021-10-11', '2021-10-12', '2021-10-13', '2021-10-14', '2021-10-15',
'2021-10-18', '2021-10-19', '2021-10-20', '2021-10-21', '2021-10-22'],
dtype='object', name='date', length=7237)
The only thing I managed to get working is something like this:
In [19]: df = get_eod_data(["SPY"])
In [20]: df = df.loc["SPY"]
In [21]: df.index = pd.to_datetime(df.index)
In [22]: df.loc["2015-01"]
Out[22]:
open high low close adj_close volume
date
2015-01-02 206.38 206.88 204.180 205.4300 180.9772 121465865
2015-01-05 204.17 204.37 201.350 201.7200 177.7088 169632646
2015-01-06 202.09 202.72 198.855 199.8200 176.0349 209151408
2015-01-07 201.42 202.72 200.880 202.3100 178.2285 125346709
2015-01-08 204.01 206.16 203.990 205.9000 181.3912 147217784
2015-01-09 206.40 206.42 203.510 204.2500 179.9376 158567288
2015-01-12 204.41 204.60 201.920 202.6500 178.5281 144396067
2015-01-13 204.12 205.48 200.510 202.0800 178.0259 214553306
2015-01-14 199.65 201.10 198.570 200.8600 176.9511 192991092
2015-01-15 201.63 202.01 198.880 199.0199 175.3301 176613906
2015-01-16 198.77 201.82 198.550 201.6300 177.6295 211879605
2015-01-20 202.40 202.72 200.170 202.0550 178.0039 130991069
2015-01-21 201.50 203.66 200.940 203.0800 178.9069 122942707
2015-01-22 203.99 206.26 202.330 206.1000 181.5674 174356029
2015-01-23 205.79 206.10 204.810 204.9700 180.5719 117516753
2015-01-26 204.71 205.56 203.850 205.4500 180.9948 92009711
2015-01-27 202.97 204.12 201.740 202.7400 178.6073 134044598
2015-01-28 204.17 204.29 199.910 200.1400 176.3168 168514312
2015-01-29 200.38 202.30 198.680 201.9902 177.9468 173585424
2015-01-30 200.57 202.17 199.130 199.4500 175.7090 197729724
It seems I need to manually convert the second level of df into a DateTimeIndex
then I can start using string indexing for it.
My question is this: what should I do in my case so that I can easily use a syntax like df.loc[("SPY", "2015-01")]
or df.loc["SPY"]["2015-01"]
to index the data for a date range easily?