2

I need a mean of all the values within that hour and I need to do it for all such hours for each day.

For Example:

Date                    Col1
2016-01-01 07:00:00      1
2016-01-01 07:05:00      2
2016-01-01 07:17:00      3
2016-01-01 08:13:00      2
2016-01-01 08:55:00      10
.
.
.
.
.
.
.
.
2016-12-31 22:00:00      3
2016-12-31 22:05:00      3
2016-12-31 23:13:00      4
2016-12-31 23:33:00      5
2016-12-31 23:53:00      6

So, I need to group all the values within that hour within that date into one ( it's mean ).

Expected Output:

Date                    Col1
2016-01-01 07:00:00      2           ##(2016-01-01 07:00:00, 07:05:00, 07:17:00) 3 values falls between the one hour range for that date i.e. 2016-01-01 07:00:00 - 2016-01-01 07:59:00, both inclusive.
2016-01-01 08:00:00      6
.
.
.
.
.
.
.
.
2016-12-31 22:00:00      3
2016-12-31 23:00:00      5

So, if I do it for whole year then in the end the total number of rows would be 365*24.

I tried solving using this answer but it doesn't work. Can anyone help me?

Gopal Chitalia
  • 430
  • 4
  • 18

2 Answers2

1

Try groupby, dt.hour, mean, reset_index and assign:

print(df.groupby(df['Date'].dt.hour)['Col1'].mean().reset_index().assign(Date=df['Date']))

Output for first two rows:

                 Date  Col1
0 2016-01-01 07:00:00     2
1 2016-01-01 07:05:00     6
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • This basically averages all the values in that hour for all the dates. so, let's say 1st january has 2 values between 7-8am and 2nd january has 2 values between 7-8am then it is taking the 4 values and averaging them. what i want is for that specific date and that hour. so 1st january 2 values average and 2nd january 2 values average. I hope I am able to express what i am saying. – Gopal Chitalia Jul 11 '19 at 09:29
  • @GopalChitalia It matches your output, i am doing the logic that you said, don't forget to accept and up-vote it – U13-Forward Jul 11 '19 at 09:30
  • It doesn't. It is basically taking all the values which is at that hour for all the days and averaging it .you can check it. you can test it on this dataset and let me know: https://filebin.net/10umpjabyxf8cz3a P.S: for sure I'll upvote your answer and accept it sir in case it seems to match my question – Gopal Chitalia Jul 11 '19 at 09:39
  • @GopalChitalia Oh, what is the reason to do `2` at a time? – U13-Forward Jul 11 '19 at 09:40
  • I didn't exactly get what you mean by 2? – Gopal Chitalia Jul 11 '19 at 09:42
  • @GopalChitalia As you said you want to average the first two, then average the next `2`, Do I understand correctly? – U13-Forward Jul 11 '19 at 09:42
  • I want all the values in a particular hour for that particular date and average that. so let's say 1st Jan 7am-8am has 4 values, then i want average of that. and say 2nd Jan 7am-8am has 2 value then I want those 2 values average. Right now it is taking all the 6 values and averaging it. basically grouping by date as well as hour. – Gopal Chitalia Jul 11 '19 at 09:46
  • `print(df.groupby([df['Date'].dt.date, df['Date'].dt.hour])['Col1'].mean())` seems to work. Thank you for you help. I have upvoted your answer – Gopal Chitalia Jul 11 '19 at 10:01
1

resample from pandas should fit your case

import pandas as pd

df = pd.DataFrame({
    'Date':['2016-01-01 07:00:00','2016-01-01 07:05:00',
            '2016-01-01 07:17:00' ,'2016-01-01 08:13:00',
            '2016-01-01 08:55:00','2016-12-31 22:00:00',
            '2016-12-31 22:05:00','2016-12-31 23:13:00',
            '2016-12-31 23:33:00','2016-12-31 23:53:00'],
    'Col1':[1, 2, 3, 2, 10, 3, 3, 4, 5, 6]
})

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d') # Convert series to datetime type

df.set_index('Date', inplace=True) # Set Date column as index


# for every hour, take the mean for the remaining columns of the dataframe 
# (in this case only for Col1, fill the NaN with 0 and reset the index)
df.resample('H').mean().fillna(0).reset_index()

df.head()
    Date    Col1
0   2016-01-01 07:00:00 2.0
1   2016-01-01 08:00:00 6.0
2   2016-01-01 09:00:00 0.0
3   2016-01-01 10:00:00 0.0
4   2016-01-01 11:00:00 0.0
dimitris_ps
  • 5,849
  • 3
  • 29
  • 55
  • 1
    Thanks a lot! This works. Also, `print(df.groupby([df['Date'].dt.date, df['Date'].dt.hour])['Col1'].mean())` works too :D – Gopal Chitalia Jul 11 '19 at 10:36