Pandas dataframe groupby datetime month

Question

Consider a CSV file:

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

b = pd.read_csv('b.dat')
b['date'] = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

What I am currently trying is re-indexing by the date:

b.index = b['date']

I can access the month like so:

b.index.month

However I can't seem to find a function to lump together by month.

If you are struggling with applying any of the answer, please keep in mind that in this question (and therefore in the answers) the Datetime value is assigned to the index of the Dataframe. A quick tip/reminder could be the following: if you have a Datetime column, you can actually access the single Yeay/Month/Day/Hour/Minute value just by doing `my_df.my_column.dt.month` — Federico Dorato, Nov 30 '20 at 13:03

score 256 · Accepted Answer · edited Mar 05 '20 at 18:12

256

Managed to do it:

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

edited Mar 05 '20 at 18:12

goodside

4,429
2
22
32

answered Jun 06 '14 at 13:38

Lee

29,398
28
117
170

63

I think the more pandonic ways are to either use `resample` (when it provides the functionality you need) or use a `TimeGrouper`: `df.groupby(pd.TimeGrouper(freq='M'))` – Karl D. Jun 06 '14 at 16:57
13

to get the result DataFrame sum or average, `df.groupby(pd.TimeGrouper(freq='M')).sum()` or `df.groupby(pd.TimeGrouper(freq='M')).mean()` – Alexandre Jan 28 '16 at 21:58
16

`pd.TimeGrouper` has been deprecated in favor of `pd.Grouper`, which is a bit more flexible but still takes `freq` and `level` arguments. – BallpointBen Dec 18 '18 at 01:18
the first method doesn't not appear to work. It gives the error, 'Series object has no attribute 'month'' for a Series created via `to_datetime`. – ely Sep 09 '19 at 20:36
1

@ely The answer implicitly relies on the lines in the original question where `b` is given an index after being read from CSV. Add `b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')` after the line `b = pd.read_csv('b.dat')`. [I've edited the answer just now too.] – goodside Mar 05 '20 at 18:09
1

To group the months in chronological order, you need to swap the month and year index. The resulting command for the grouping being `b.groupby(by=[b.index.year, b.index.month])`. – MangoNrFive Oct 24 '22 at 17:36

score 116 · Answer 2 · answered Jan 20 '18 at 12:38

116

(update: 2018)

Note that pd.Timegrouper is depreciated and will be removed. Use instead:

 df.groupby(pd.Grouper(freq='M'))

answered Jan 20 '18 at 12:38

PandasRocks

1,609
1
10
12

3

Find the Grouper docs [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html?highlight=grouper) and the frequency specifications (`freq=...`) [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). Some examples are `freq=D` for **days**, `freq=B` for **business days**, `freq=W` for **weeks** or even `freq=Q` for **quarters**. – Kim Mar 25 '20 at 12:09
16

I found it useful to use 'key' to avoid having to reindex the df, as follows: df.groupby(pd.Grouper(key='your_date_column', freq='M')) – Edward Oct 15 '20 at 15:00
Does this work if you're grouping by two columns, only one of which is datetime value column? – exlo Mar 17 '21 at 03:23
2

speeding up the further research for those who want to group by specific column (or more): df.groupby(['col1', pd.Grouper(key='date_col', freq='1M')]).agg({ 'col2': 'sum', 'col3': 'max' }) – Manaslu Apr 27 '21 at 08:09

Mykola Zotko · Answer 3 · 2021-07-05T14:17:25.397

29

To groupby time-series data you can use the method resample. For example, to groupby by month:

df.resample(rule='M', on='date')['Values'].sum()

The list with offset aliases you can find here.

edited Jul 05 '21 at 14:17

answered Jun 08 '21 at 08:16

Mykola Zotko

15,583
3
71
73

3

Thanks! This should be the accepted answer: `groupby` by time is called `resample`. – Michael Dorner Aug 08 '22 at 12:50

jpp · Answer 4 · 2020-11-25T17:58:51.497

One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column.

Normalise day of month

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20', '2017-10-01', '2017-09-01']),
                   'Values': [5, 10, 15, 20]})

# normalize day to beginning of month, 4 alternative methods below
df['YearMonth'] = df['Date'] + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1)
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))
df['YearMonth'] = df['Date'].dt.normalize().map(pd.tseries.offsets.MonthBegin().rollback)

Then use groupby as normal:

g = df.groupby('YearMonth')

res = g['Values'].sum()

# YearMonth
# 2017-09-01    20
# 2017-10-01    30
# Name: Values, dtype: int64

Comparison with `pd.Grouper`

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.

Comparison with string conversion

An alternative to the above idea is to convert to a string, e.g. convert datetime 2017-10-XX to string '2017-10'. However, this is not recommended since you lose all the efficiency benefits of a datetime series (stored internally as numerical data in a contiguous memory block) versus an object series of strings (stored as an array of pointers).

See this answer for the proper way to utilize offsets when there are already day=1 values : https://stackoverflow.com/a/45831333/9987623. — AlexK, Nov 22 '20 at 04:08
@AlexK, does `pd.tseries.offsets` have an advantage over `pd.tseries.MonthBegin` ? — jpp, Nov 22 '20 at 12:07
sorry, I don't know enough to tell those apart. I just added the comment because your `df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)` code above changes any date that is already the first of the month to the first of the previous month. — AlexK, Nov 23 '20 at 01:27
For beginning of month with `Grouper`, see also https://stackoverflow.com/a/56280791/10495893 — Ben Reiniger, Aug 05 '22 at 15:04

score 11 · Answer 5 · answered Oct 11 '18 at 10:58

11

Slightly alternative solution to @jpp's but outputting a YearMonth string:

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))

res = df.groupby('YearMonth')['Values'].sum()

answered Oct 11 '18 at 10:58

tsando

4,557
2
33
35

Helpful in the case where you don't want to set your date column as an index – user2723494 Apr 18 '22 at 13:37
Can also generate strings using dt.strfrtime: `df['Date'].dt.strftime('%Y %b.')` – Raisin May 19 '22 at 17:33

Pandas dataframe groupby datetime month

5 Answers5

Normalise day of month

Comparison with `pd.Grouper`

Comparison with string conversion

Linked

Related

Pandas dataframe groupby datetime month

5 Answers5

Normalise day of month

Comparison with pd.Grouper

Comparison with string conversion

Linked

Related

Comparison with `pd.Grouper`