1

Edit: I figured it out. I need to sort_index before I can do any slicing...

First some context

I have a dataframe like this (note that there might be more tickers than SPY, I just took the one ticker here...):

                       open      high       low     close  adj_close    volume
ticker date                                                                   
SPY    1993-01-29   43.9688   43.9688   43.7500   43.9375    25.7297   1003200
       1993-02-01   43.9688   44.2500   43.9688   44.2500    25.9127    480500
       1993-02-02   44.2188   44.3750   44.1250   44.3438    25.9676    201300
       1993-02-03   44.4062   44.8438   44.3750   44.8125    26.2421    529400
       1993-02-04   44.9688   45.0938   44.8750   45.0000    26.3519    531500
...                     ...       ...       ...       ...        ...       ...
       2021-10-18  443.9700  447.5500  443.2700  447.1900   447.1900  62213228
       2021-10-19  448.9200  450.7100  448.2700  450.6400   450.6400  46996827
       2021-10-20  451.1300  452.7320  451.0100  452.4100   452.4100  49571569
       2021-10-21  451.7700  453.8290  451.3100  453.5900   453.5900  41305438
       2021-10-22  453.1300  454.6700  451.0500  453.1200   453.1200  58845088

And if say if I want to get SPY's price for 2015-01, I think I should be able to do this:

df.loc[("SPY", "2015-01")]

This will nicely preserve the shape of the df, but it turns out I cannot, and then I tried to use the following syntax, which if worked the dataframe will become a df with single index.

df.loc["SPY"]["2015-01"]

But unfortunately, both solution does not work. With both methods giving me KeyError. I have verified the second level of the index did contain date object, not pure strings.

In [24]: df.index
Out[24]: 
MultiIndex([('SPY', '1993-01-29'),
            ('SPY', '1993-02-01'),
            ('SPY', '1993-02-02'),
            ('SPY', '1993-02-03'),
            ('SPY', '1993-02-04'),
            ('SPY', '1993-02-05'),
            ('SPY', '1993-02-08'),
            ('SPY', '1993-02-09'),
            ('SPY', '1993-02-10'),
            ('SPY', '1993-02-11'),
            ...
            ('SPY', '2021-10-11'),
            ('SPY', '2021-10-12'),
            ('SPY', '2021-10-13'),
            ('SPY', '2021-10-14'),
            ('SPY', '2021-10-15'),
            ('SPY', '2021-10-18'),
            ('SPY', '2021-10-19'),
            ('SPY', '2021-10-20'),
            ('SPY', '2021-10-21'),
            ('SPY', '2021-10-22')],
           names=['ticker', 'date'], length=7237)

In [25]: df.index.get_level_values(1)
Out[25]: 
Index(['1993-01-29', '1993-02-01', '1993-02-02', '1993-02-03', '1993-02-04',
       '1993-02-05', '1993-02-08', '1993-02-09', '1993-02-10', '1993-02-11',
       ...
       '2021-10-11', '2021-10-12', '2021-10-13', '2021-10-14', '2021-10-15',
       '2021-10-18', '2021-10-19', '2021-10-20', '2021-10-21', '2021-10-22'],
      dtype='object', name='date', length=7237)

The only thing I managed to get working is something like this:

In [19]: df = get_eod_data(["SPY"])

In [20]: df = df.loc["SPY"]

In [21]: df.index = pd.to_datetime(df.index)

In [22]: df.loc["2015-01"]
Out[22]: 
              open    high      low     close  adj_close     volume
date                                                               
2015-01-02  206.38  206.88  204.180  205.4300   180.9772  121465865
2015-01-05  204.17  204.37  201.350  201.7200   177.7088  169632646
2015-01-06  202.09  202.72  198.855  199.8200   176.0349  209151408
2015-01-07  201.42  202.72  200.880  202.3100   178.2285  125346709
2015-01-08  204.01  206.16  203.990  205.9000   181.3912  147217784
2015-01-09  206.40  206.42  203.510  204.2500   179.9376  158567288
2015-01-12  204.41  204.60  201.920  202.6500   178.5281  144396067
2015-01-13  204.12  205.48  200.510  202.0800   178.0259  214553306
2015-01-14  199.65  201.10  198.570  200.8600   176.9511  192991092
2015-01-15  201.63  202.01  198.880  199.0199   175.3301  176613906
2015-01-16  198.77  201.82  198.550  201.6300   177.6295  211879605
2015-01-20  202.40  202.72  200.170  202.0550   178.0039  130991069
2015-01-21  201.50  203.66  200.940  203.0800   178.9069  122942707
2015-01-22  203.99  206.26  202.330  206.1000   181.5674  174356029
2015-01-23  205.79  206.10  204.810  204.9700   180.5719  117516753
2015-01-26  204.71  205.56  203.850  205.4500   180.9948   92009711
2015-01-27  202.97  204.12  201.740  202.7400   178.6073  134044598
2015-01-28  204.17  204.29  199.910  200.1400   176.3168  168514312
2015-01-29  200.38  202.30  198.680  201.9902   177.9468  173585424
2015-01-30  200.57  202.17  199.130  199.4500   175.7090  197729724

It seems I need to manually convert the second level of df into a DateTimeIndex then I can start using string indexing for it.

My question is this: what should I do in my case so that I can easily use a syntax like df.loc[("SPY", "2015-01")] or df.loc["SPY"]["2015-01"] to index the data for a date range easily?

Bob Fang
  • 6,963
  • 10
  • 39
  • 72
  • `df.loc["SPY"]["2015-01"]` <- the second thing is always going to be a column look up whether there is a MultiIndex or not. I suppose you could do `df.loc["SPY"].loc["2015-01"]` but this is making 2 copies... There must be a better way. – Henry Ecker Oct 31 '21 at 13:18
  • @HenryEcker strangely `df.loc["SPY"].loc["2015-01"]` also gives me a key error for '2015-01', I think I have to convert the second level index before I can do any string indexing on it... – Bob Fang Oct 31 '21 at 13:22
  • Don't use `datetime`. If you're using `pandas` then stick to `Timestamp` and the other pandas time functionality – Riley Oct 31 '21 at 13:26
  • Also have a look at the query method in the top answer here https://stackoverflow.com/questions/17921010/how-to-query-multiindex-index-columns-values-in-pandas#18103894 – Riley Oct 31 '21 at 13:33
  • As I'm looking into this, I'm finding that `df.loc[('SPY', '2015-01')]` works fine (ignoring the FutureWarning) and `df.loc["SPY"].loc["2015-01"]` works with no warning. You say the index is of type `date` are you sure? `df.index.dtypes` gives `date datetime64[ns]`? The evidence points to it not being a supported type. – Henry Ecker Oct 31 '21 at 13:34
  • @HenryEcker I realised what I did wrong, the index was not sorted. After I did df.sort_index() everything seems fine now... – Bob Fang Oct 31 '21 at 14:26
  • @HenryEcker thanks for double-checking for me... – Bob Fang Oct 31 '21 at 14:26
  • 1
    Ahhh… that makes sense! date times out of order don’t you well with non exact slicing. Glad you got it figured out :-) – Henry Ecker Oct 31 '21 at 14:30

0 Answers0