1

I have a data frame with index (ID, Date), where I would like to select all observations within a certain timeframe (e.g. 06-1988 : 11-1988). In case not all observations are present within the timeframe for a certain ID, I would still like to include the ones that are present.

Visualization of my data frame to give a better idea: (The panel is not balanced)

                           Var1          Var2
    ID     Date
    10113  2010-07         24.7000       24.7000      
           2010-08         25.2600       24.7000      
           2010-09         25.2800       25.2800  
           2010-10         25.3700       25.3700 
    10223  2010-09         24.7000       24.7000
           2010-10         25.2600       25.2600    
           2011-11         25.2800       25.2800  
           2011-12         25.3700       25.3700 
           2012-01         25.2900       25.2900 

For instance I would want all observations between 2010-09 and 2011-12, I would want to get the following output:

                           Var1          Var2
    ID     Date
    10113  2010-09         25.2800       25.2800  
           2010-10         25.3700       25.3700        
    10223  2010-09         24.7000       24.7000
           2010-10         25.2600       25.2600    
           2011-11         25.2800       25.2800  
           2011-12         25.3700       25.3700 

I also have a series with the same two multi-index settings, and there the following command worked:

X.loc[:,'1988-06':'1998-07']

If I wanted all observations between 1988-06 and 1998-07 for all ID's. When I try this same approach for the Dataframe, I get a KeyError.

Is it possible to do the same with a Dataframe?

John
  • 555
  • 3
  • 16

2 Answers2

0

Have you tried to access it with only one 'Date' value? Something like: X.loc[:,'1988-06']

This issue could also help: pandas: slice a MultiIndex by range of secondary index

Jules
  • 395
  • 4
  • 17
  • Yes I did try that, and it also gave me the same error. However when I did the following: X.loc[10113].loc['01-1997':'01-1998'], then it did gave me the correct output for index 10113 only. Though this only worked for ID's where all dates within ['01-1997':'01-1998'] were present. – John May 16 '19 at 13:11
  • Have you tried to pass a tuple as index? Like they are doing here [https://stackoverflow.com/a/13389808/10716506] – Jules May 16 '19 at 13:15
  • Yes I tried it as well, but I think it doesn't work with the ':' that I want to use between the dates. – John May 16 '19 at 13:24
  • Yes i'm currently trying to convert them to datetimes, perhaps that fixes the issue – John May 16 '19 at 13:40
0

Try this:

out_frame = df[(df['Date'] > '2010-09') & (df['Date'] < '2012-12')]
  • This part df['Date'] is not working, I think it has to do with the multi indexing. I get the following error: KeyError: 'Date' – John May 16 '19 at 13:24
  • print out you data frame structure or attach example file –  May 16 '19 at 13:27