1

I have a dataframe,df after converting DateTimeC column to a datetime type such as:

  Index     DateTimeC                       eventName  
    0      2017-08-20 01:11:24.210000        ABC           
    1      2017-08-20 01:11:30.224000        CDE         
    2      2017-08-20 02:16:30.210000        CDE       
    3      2017-08-20 02:27:30.211000        CDE       
    2      2017-09-10 01:30:40.212000        DEF            
    3      2017-09-11 01:35:23.122000        CDE            
    4      2017-09-11 02:22:22.145000        CDE            
    5      2017-09-16 02:26:11.222000        DEF           

I intend to groupby month and hour and count the count of events in eventName within the grouped object.So applying this code:

 df2=df.groupby([df['DateTimeC'].dt.month,df['DateTimeC'].dt.hour])['EventName'].count()

I get:

   Index   EventName                       
    8,1     2          
    8,2     2         
    9,1     2           
    9,2     2           

However, i want to use the pivot_table () on the resulting series as month as the index and hour as the column while the value parameter should be the frequency. So the resulting dataframe should be:

   Index    0  1  2  3  4  5  6  7  8... 24                
    8       0  2  2  0  0  0  0  0  0... 0
    9       0  2  2  0  0  0  0  0  0... 0     

So what will be the corresponding argument for the parameters of the pivot_table () since the date and time are in the same column: DateTimeC

I tried to add the rename_index to rename the column for the frequency/count result so I can pass the new name to the 'value' parameter in the pivot_table () using this code:

 df2=df.groupby([df['DateTimeC'].dt.month,df['DateTimeC'].dt.hour])['EventName'].count().reset_index(name='frequency')

but i get this error:

ValueError: cannot insert DateTimeC, already exists

Also, get the mean per hour for each month and transform the number associated with a specific month to its word equivalent

    Index      averagePerHour                       
    August       0.17          
    September    0.17         
Bode
  • 527
  • 2
  • 9
  • 19

1 Answers1

1

I think need unstack for reshape and reindex if necessary add missing hours:

df2=(df.groupby([df['DateTimeC'].dt.month.rename('month'),
                df['DateTimeC'].dt.hour.rename('hour')])
      .size()
      .unstack(fill_value=0)
      .reindex(columns=np.arange(24), fill_value=0))
print (df2)
hour   0   1   2   3   4   5   6   7   8   9  ...  14  15  16  17  18  19  20  \
month                                         ...                               
8       0   2   2   0   0   0   0   0   0   0 ...   0   0   0   0   0   0   0   
9       0   2   2   0   0   0   0   0   0   0 ...   0   0   0   0   0   0   0   

hour   21  22  23  
month              
8       0   0   0  
9       0   0   0  

[2 rows x 24 columns]

And then create dictionary for renamewith mean:

L = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
d = dict(enumerate(L, 1))
df3 = df2.mean(axis=1).rename(d).to_frame('averagePerHour')
print (df3)
       averagePerHour
month                
Aug          0.166667
Sep          0.166667

If omit reindex and missing some hours then mean is different:

df2=(df.groupby([df['DateTimeC'].dt.month.rename('month'),
                df['DateTimeC'].dt.hour.rename('hour')])
      .size()
      .unstack(fill_value=0)
      )
print (df2)
hour   1  2
month      
8      2  2
9      2  2

L = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
d = dict(enumerate(L, 1))
df3 = df2.mean(axis=1).rename(d).to_frame('averagePerHour')
print (df3)
       averagePerHour
month                
Aug               2.0
Sep               2.0

EDIT: If want convert months number to stings use dt.strftime, check also http://strftime.org/:

df2=(df.groupby([df['DateTimeC'].dt.strftime('%B').rename('month'),
                df['DateTimeC'].dt.hour.rename('hour')])
      .size()
      .unstack(fill_value=0)
      )
print (df2)
hour       1  2
month          
August     2  2
September  2  2

df3 = df2.mean(axis=1).to_frame('averagePerHour')
print (df3)
           averagePerHour
month                    
August                2.0
September             2.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks man, it worked but for getting the mean: is it possible i get a simpler method that uses striptime() directly on the month column to transform the number of the month to its corresponding word like (3 to March) and helps calculate the mean with the mean() – Bode Aug 27 '18 at 12:46
  • i am getting this error: AttributeError: 'list' object has no attribute 'dt' – Bode Aug 27 '18 at 13:35
  • @Bode - Maybe some typo? I just realised need `%B` instead `%a`, check edited answer. – jezrael Aug 27 '18 at 13:40
  • okay! it now works, thanks but the month is not arranged in ascending order..like March,april,May,June. How can i sort the month? – Bode Aug 27 '18 at 13:56
  • @Bode - I think need [this](https://stackoverflow.com/a/40816203/2901002) solution, only replace `Jan` to `January` ans similar for same months – jezrael Aug 27 '18 at 13:58