How can I group by month from a date field using Python and Pandas?

Question

I have a dataframe, df, which is as follows:

| date      | Revenue |
|-----------|---------|
| 6/2/2017  | 100     |
| 5/23/2017 | 200     |
| 5/20/2017 | 300     |
| 6/22/2017 | 400     |
| 6/21/2017 | 500     |

I need to group the above data by month to get output as:

| date | SUM(Revenue) |
|------|--------------|
| May  | 500          |
| June | 1000         |

I tried this code, but it did not work:

df.groupby(month('date')).agg({'Revenue': 'sum'})

I want to only use Pandas or NumPy and no additional libraries.

`df.groupby(pd.Grouper(key='Date',freq='M')).agg({'Revenue':'sum'})`, this assumes the data type of the date column is datetime — gold_cy, Jul 04 '17 at 14:25

score 78 · Accepted Answer · edited Apr 09 '22 at 12:13

78

Try this:

In [6]: df['date'] = pd.to_datetime(df['date'])

In [7]: df
Out[7]:
        date  Revenue
0 2017-06-02      100
1 2017-05-23      200
2 2017-05-20      300
3 2017-06-22      400
4 2017-06-21      500



In [59]: df.groupby(df['date'].dt.strftime('%B'))['Revenue'].sum().sort_values()
Out[59]:
date
May      500
June    1000

edited Apr 09 '22 at 12:13

Peter Mortensen

30,738
21
105
131

answered Jul 04 '17 at 14:31

shivsn

7,680
1
26
33

1

up vote because it's the only answer which formats the `date` column properly – gold_cy Jul 04 '17 at 14:42
2

FYI this gives u a string column for the date which is not as performant nor useful (as real resamplimg / time grouping) – Jeff Jul 04 '17 at 15:30
@shivsn: can this be sorted by date- ascending wise (May-500 and then June -1000) ? – Symphony Jul 04 '17 at 19:02
what do u mean by df. how to import df?? – Ragulan Sep 05 '18 at 04:42
@Ragulan28 df is the name of the DateFrame.For eg df = pd.read_csv('file.csv') – Paras jain Dec 15 '18 at 08:43
I want to add 1 more groupby column in your solution how can I do that?like this lead_visits.groupby(['date', 'status'], as_index=False) but instead of 'date', I want months of all dates – Akshit jain Nov 20 '19 at 10:21
Method 3 by @yongtw123 [here](https://stackoverflow.com/a/50962588/10916136) is much faster. – Meet Jun 18 '21 at 10:39

score 34 · Answer 2 · edited Apr 09 '22 at 12:14

34

Try a groupby using a pandas Grouper:

df = pd.DataFrame({'date':['6/2/2017','5/23/2017','5/20/2017','6/22/2017','6/21/2017'],'Revenue':[100,200,300,400,500]})
df.date = pd.to_datetime(df.date)
dg = df.groupby(pd.Grouper(key='date', freq='1M')).sum() # groupby each 1 month
dg.index = dg.index.strftime('%B')

Output:

     Revenue
 May    500
June    1000

edited Apr 09 '22 at 12:14

Peter Mortensen

30,738
21
105
131

answered Jul 04 '17 at 14:29

qbzenker

4,412
3
16
23

1

This seems to be the best combination between performance and general usability – MichaelA Dec 23 '19 at 10:17
1

this is the most usable since the months are aggregated in order. This should be the top answer. – JJSanDiego Feb 02 '21 at 03:43
2

This doesn't work if your dates are across years. Then instead of just one row for month (say May and June above), it will have multiple May's and June's for each year. – Meet Jun 18 '21 at 10:36
This answer is totally wrong. It groups by year month not month. – baklarz2048 Oct 27 '22 at 12:30

score 12 · Answer 3 · answered Jun 21 '18 at 07:18

For DataFrame with many rows, using strftime takes up more time. If the date column already has dtype of datetime64[ns] (can use pd.to_datetime() to convert, or specify parse_dates during csv import, etc.), one can directly access datetime property for groupby labels (Method 3). The speedup is substantial.

import numpy as np
import pandas as pd

T = pd.date_range(pd.Timestamp(0), pd.Timestamp.now()).to_frame(index=False)
T = pd.concat([T for i in range(1,10)])
T['revenue'] = pd.Series(np.random.randint(1000, size=T.shape[0]))
T.columns.values[0] = 'date'

print(T.shape) #(159336, 2)
print(T.dtypes) #date: datetime64[ns], revenue: int32

Method 1: strftime

%timeit -n 10 -r 7 T.groupby(T['date'].dt.strftime('%B'))['revenue'].sum()

1.47 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Method 2: Grouper

%timeit -n 10 -r 7 T.groupby(pd.Grouper(key='date', freq='1M')).sum()
#NOTE Manually map months as integer {01..12} to strings

56.9 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Method 3: datetime properties

%timeit -n 10 -r 7 T.groupby(T['date'].dt.month)['revenue'].sum()
#NOTE Manually map months as integer {01..12} to strings

34 ms ± 3.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Note that if you have data from more than 1 year, methods 1 and 3 aggregate over them whereas method 2 does not. Also, the result from method 1 is sorted alphabetically. — HenriV, Sep 12 '18 at 14:08
how to arrange the outputs in ascending order? since it is now array not df — alex3465, May 28 '21 at 12:17

score 3 · Answer 4 · edited Apr 09 '22 at 12:21

This will work better.

Try this:

# Explicitly convert to date
df['Date'] = pd.to_datetime(df['Date'])
# Set your date column as index 
df.set_index('Date',inplace=True) 

# For monthly use 'M', If needed for other freq you can change.
df[revenue].resample('M').sum()

This code gives the same result as shivsn's answer on the first post.

But the thing is we can do a lot more operations in this mentioned code.

It is recommended to use this:

>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df.set_index('Date',inplace=True)
>>> df['withdrawal'].resample('M').sum().sort_values()
Date
2019-10-31     28710.00
2019-04-30     31437.00
2019-07-31     39728.00
2019-11-30     40121.00
2019-05-31     46495.00
2020-02-29     57751.10
2019-12-31     72469.13
2020-01-31     76115.78
2019-06-30     76947.00
2019-09-30     79847.04
2020-03-31     97920.18
2019-08-31    205279.45
Name: withdrawal, dtype: float64

where shivsn's code does the same.

>>> df.groupby(df['Date'].dt.strftime('%B'))['withdrawal'].sum().sort_values()
Date
October       28710.00
April         31437.00
July          39728.00
November      40121.00
May           46495.00
February      57751.10
December      72469.13
January       76115.78
June          76947.00
September     79847.04
March         97920.18
August       205279.45
Name: withdrawal, dtype: float64

Why would you have to set the date as the `DataFrame` index when `groupby` works just fine? What happens if you have same date repeated? I would also consider the date formatting as first step, the highest voted answer, because it is not clear from the question if that column will be properly formatted. — user7440787, Apr 06 '20 at 17:50
Either it is a duplicate or unique , it is going to sum up all those values. It acts as same as groupby.And apart from groupby options , By setting the date column as index you can perform lot more basic operations. Thanks. — Jeywanth Kannan, Apr 07 '20 at 20:14

score 0 · Answer 5 · edited Apr 09 '22 at 12:25

Try this:

Change the date column into datetime format.

---> df['Date'] = pd.to_datetime(df['Date'])
Insert a new row in the data frame which has month like [May, 'June']

---> df['months'] = df['date'].apply(lambda x:x.strftime('%B'))

---> here x is date which take from date column in data frame.
Now aggregate the aggregate data in the month column and sum the revenue.

--->response_data_frame = df.groupby('months')['Revenue'].sum()

---->print(response_data_frame)

Output:

month	Revenue
May	500
June	1000

score 0 · Answer 6 · answered Mar 10 '21 at 13:25

0

df['Month'] = pd.DatetimeIndex(df['date']).month_name()

Using this you should get

date	Revenue	Month
6/2/2017	100	June
5/23/2017	200	May
5/20/2017	300	May
6/22/2017	400	June
6/21/2017	500	June

answered Mar 10 '21 at 13:25

Kenrich

23
5

How can I group by month from a date field using Python and Pandas?

6 Answers6

Method 1: strftime

Method 2: Grouper

Method 3: datetime properties

Linked