1

My multi-indexed data frame is as follows:

df.head()
                         Output  
Unit    Timestamp                                               
1   2016-06-01 00:00:00  225894.9
    2016-06-01 01:00:00  225895.9
    2016-06-01 02:00:00  225896.9   
    2016-06-01 03:00:00  225897.9 
    2016-06-01 04:00:00  225898.9 

df.tail()

                         Output 
Unit Timestamp                                               
16  2016-06-30 18:00:00  150543.1        
    2016-06-30 19:00:00  150544.1        
    2016-06-30 21:00:00  150546.1        
    2016-06-30 22:00:00  150547.1     
    2016-06-30 23:00:00  150548.1  

That is, one month's worth of hourly data for 16 units.

I want to pick out one day's data for a given unit. I have an array of the days that appear in the dataframe,

days = array([datetime.date(2016, 6, 1), datetime.date(2016, 6, 2), datetime.date(2016, 6, 3), datetime.date(2016, 6, 4),... etc etc

If I do df.ix[5,'2016-06-10'], I get exactly what I want, that is the day's for unit 5 that date

                       Output
Timestamp
2016-06-10 00:00:00    152364.6
2016-06-10 01:00:00    152365.7
2016-06-10 02:00:00    152366.6
...
2016-06-10 21:00:00    152386.6
2016-06-10 22:00:00    152386.6
2016-06-10 23:00:00    152387.6

But the same indexing doesn't seem to work when I have a datetime.date object instead of a string. For example:

tenth = days[9] evaulates to datetime.date(2016, 6, 10), so far so good.

However when I do df.ix[5,tenth] I get "KeyError: u'no item named 2016-06-10'"

So, the '2016-06-10' "does what I mean" and returns all the data on that day. How can I do the same if with a datetime.date?

djnz0feh
  • 383
  • 4
  • 13
  • can you maybe paste a small sample of your data, not necessarily enough to include more than one unit? perhaps the results of `df.head(40).to_dict()` – juanpa.arrivillaga Jul 13 '16 at 22:31
  • See this post http://stackoverflow.com/a/38346697/2336654 – piRSquared Jul 13 '16 at 22:38
  • Is your index level `Timestamp` really Pandas `DateTimeIndex`? Double check that. – Kartik Jul 13 '16 at 22:39
  • @piRSquared `type(tenth)` is `datetime.date`. Also I read about `slice` but I think it's in a later pandas version than I have available. Therefore @Karrtik, it is not a DateTimeIndex. That might well help! – djnz0feh Jul 14 '16 at 08:04
  • @Kartik - correction, yes the `Timestamp`s are of type `DatetimeIndex` – djnz0feh Jul 14 '16 at 08:44

2 Answers2

2

When you pass a string that looks like a datetime to the pandas selector ix, it uses it like a condition and returns all elements that satisfy. In this case, the string you are using evaluates to a day. Pandas runs ix and returns all rows within that day. When you pass the datetime object, it looks for an exact match. You don't have one since all your datetimes are non-zero hour times.

Consider:

s = pd.Series(range(5), pd.date_range('2016-03-31 01:00:00', periods=5, freq='H'))
s

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

get all elements on '2016-03-31'

s.ix['2016-03-31']

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64

Now assign a datetime

date = pd.to_datetime('2016-03-31')

s.ix[date]

And...

KeyError: Timestamp('2016-03-31 00:00:00')

If instead we assigned a datetime where we do have an exact match, we should get a result.

date2 = pd.to_datetime('2016-03-31 02:00:00')

Then

s.ix[date2]

Returns:

1

That worked!

To use datetimes and make it look pretty, use a function:

d2s = lambda d: d.strftime('%Y-%d-%m')

Then

s.ix[d2s(date)]

2016-03-31 01:00:00    0
2016-03-31 02:00:00    1
2016-03-31 03:00:00    2
2016-03-31 04:00:00    3
2016-03-31 05:00:00    4
Freq: H, dtype: int64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I reckoned something like that must be happening with `ix` processing the string; your solution does work in the sense that it does not give an error but I wanted a whole days' worth of data. I want to take advantage of what `ix` does but using a variable. I did also try `df.ix[5,tenth.strftime('%Y-%d-%m')]` but that doesn't work either and starts to look more that a little silly! – djnz0feh Jul 14 '16 at 08:12
  • `d2s(tenth)` gives me `'2016-06-10'` but `df.ix[5,d2s(tenth)]` gives me the same error. – djnz0feh Jul 14 '16 at 08:38
  • @djnz0feh fixed, I used wrong format. I updated post. – piRSquared Jul 14 '16 at 08:45
  • Arrgh! with `d2s = lambda d: d.strftime('%Y-%m-%d')` it does of course work. Well done. Funny I got that far before posting the question and didn't notice the wrong date format. – djnz0feh Jul 14 '16 at 09:07
1

Let me suggest you a different method rather than using ix. Why don't you directly use a range query?

df = df[df.index.get_level_values('Unit') == 6 & 
        (df.index.get_level_values('Timestamp') >= tenth & 
        (df.index.get_level_values('Timestamp') <= tenth)]

Can you just try whether this is working?

Christin Jose
  • 1,427
  • 1
  • 9
  • 7
  • To use this approach exactly as you say I'd need to maintain an `eleventh` and use it instead of the second `tenth` in suggestion. What does so far seem to work is `df[(df.index.get_level_values('Unit') == 6) & (df.index.get_level_values('Timestamp').date == tenth.date())]`. Note which date is a function and which is a property! – djnz0feh Jul 14 '16 at 08:57