3

I have a dataframe with datetime index:df.head(6)

                          NUMBERES              PRICE    
DEAL_TIME
2015-03-02 12:40:03              5                 25   
2015-03-04 14:52:57              7                 23   
2015-03-03 08:10:09             10                 43   
2015-03-02 20:18:24              5                 37   
2015-03-05 07:50:55              4                 61   
2015-03-02 09:08:17              1                 17   

The dataframe includes the data of one week. Now I need to count the time period of the day. If time period is 1 hour, I know the following method would work:

df_grouped = df.groupby(df.index.hour).count()

But I don't know how to do when the time period is half hour. How can I realize it?

UPDATE:

I was told that this question is similar to How to group DataFrame by a period of time?

But I had tried the methods mentioned. Maybe it's my fault that I didn't say it clearly. 'DEAL_TIME' ranges from '2015-03-02 00:00:00' to '2015-03-08 23:59:59'. If I use pd.TimeGrouper(freq='30Min') or resample(), the time periods would range from '2015-03-02 00:30' to '2015-03-08 23:30'. But what I want is a series like below:

              COUNT      
DEAL_TIME
00:00:00         53 
00:30:00         49 
01:00:00         31
01:30:00         22
02:00:00          1
02:30:00         24
03:00:00         27
03:30:00         41
04:00:00         41
04:30:00         76
05:00:00         33
05:30:00         16
06:00:00         15
06:30:00          4
07:00:00         60
07:30:00         85
08:00:00          3
08:30:00         37
09:00:00         18
09:30:00         29
10:00:00         31
10:30:00         67
11:00:00         35
11:30:00         60
12:00:00         95
12:30:00         37
13:00:00         30
13:30:00         62
14:00:00         58
14:30:00         44
15:00:00         45
15:30:00         35
16:00:00         94
16:30:00         56
17:00:00         64
17:30:00         43
18:00:00         60
18:30:00         52
19:00:00         14
19:30:00          9
20:00:00         31
20:30:00         71
21:00:00         21
21:30:00         32
22:00:00         61
22:30:00         35
23:00:00         14
23:30:00         21

In other words, the time period should be irrelevant to the date.

Community
  • 1
  • 1
J Huang
  • 87
  • 1
  • 5

2 Answers2

4

You need a 30-minute time grouper for this:

grouper = pd.TimeGrouper(freq="30T")

You also need to remove the 'date' part from the index:

df.index = df.reset_index()['index'].apply(lambda x: x - pd.Timestamp(x.date()))

Now, you can group by time alone:

df.groupby(grouper).count()

You can find somewhat obscure TimeGrouper documentation here: pandas resample documentation (it's actually resample documentation, but both features use the same rules).

Community
  • 1
  • 1
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • No, I know this method. Please read my question again. I have something updated. – J Huang Mar 16 '17 at 06:32
  • Your method seems to be sense. But index will change to timedelta after removing the 'date', Timegrouper() will return a ValueError: Inferred frequency None from passed timedeltas does not conform to passed frequency 30T. – J Huang Mar 16 '17 at 07:24
  • Yes, the index becomes a `timedelta64`. But it does not cause any problems with the grouper (I use pandas 0.18.1). The first row of the result is `07:50:55 2 2`, you can adjust the base if you want the window to start at 00 and 30 minutes. – DYZ Mar 16 '17 at 07:30
  • Thank you very much, I have benefited a lot. – J Huang Mar 16 '17 at 07:58
1
  • In pandas, the most common way to group by time is to use the .resample() function.
  • In v0.18.0 this function is two-stage.

  • This means that df.resample('M') creates an object to which we can apply other functions (mean, count, sum, etc.)

The code snippet will be like,

df.resample('M').count()

You can refer here for example.

Karthikeyan KR
  • 1,134
  • 1
  • 17
  • 38
  • Thank you for your reply. But this is not what I need. Please read my question again. I have something updated. – J Huang Mar 16 '17 at 06:33
  • 1
    This does not answer the question I'm afraid, what he is looking for is a way to group by 15 minutes over all days (so one 15 min interval should contain the data in that interval over all included days) not just consecutive 15min stretches! – meow May 22 '19 at 12:30