1

I'm trying to get the mean from a pandas dataframe and convert into a new dataframe with the mean of a column grouped by month + year. So I found this answer: pandas dataframe groupby datetime month, but it didn't quite do the trick.

This is what I have:

dt            grade
2020-01-01    10
2020-01-02    20
2020-02-01    30
2020-02-01    40
2020-03-01    10
2020-03-04    20

This is what I want:

dt           grade_mean
2020-01      15
2020-02      35
2020-03      15

I would like to also be able to group by day or week, and also get a dataframe as the output. So this is what I've done so far:

df = pd.to_datetime(df['dt'], format="%m/%d/%y %I:%M%p")

df_grouped = df.groupby(by=[df.index.month, df.index.year])

df_grouped.grade.mean()

It's almost done, but I still can't get the dataframe.

dummmmf
  • 63
  • 5

2 Answers2

2

First convert your date to datetime type:

# this code is wrong, you overwrite `df`
# df = pd.to_datetime(df['dt'], format="%m/%d/%y %I:%M%p")
# use
df['dt'] = pd.to_datetime(df['dt'])

# and groupby with two keys can be a lot slower than with just one
# df_grouped = df.groupby(by=[df.index.month, df.index.year])

Try using to_period:

out = df.groupby(pd.to_datetime(df['dt']).dt.to_period('M')).mean()

Also, pd.Grouper as in the accepted answer would be good:

# also `W` for week and `D` for day
out = df.groupby(pd.Grouper(freq='M', key='dt')).mean()
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

One method, and possibly quite expensive as you're evaluating a string rather than an integer

is to use dt.strftime on your datetime object.

# assuming `dt` is already a a datetime object.
# df['dt'] = pd.to_datetime(df['dt'],format='%Y-%m-%d')
df.groupby([df['dt'].dt.strftime('%Y-%m')])['grade'].mean().reset_index()

        dt  grade
0  2020-01     15
1  2020-02     35
2  2020-03     15
Umar.H
  • 22,559
  • 7
  • 39
  • 74