5

I have a dataframe with a date column (type datetime). I can easily extract the year or the month to perform groupings, but I can't find a way to extract both year and month at the same time from a date. I need to analyze performance of a product over a 1 year period and make a graph with how it performed each month. Naturally I can't just group by month because it will add the same months for 2 different years, and grouping by year doesn't produce my desired results because I need to look at performance monthly.

I've been looking at several solutions, but none of them have worked so far.

So basically, my current dates look like this

2018-07-20
2018-08-20
2018-08-21
2018-10-11
2019-07-20
2019-08-21

And I'd just like to have 2018-07, 2018-08, 2018-10, and so on.

Umar.H
  • 22,559
  • 7
  • 39
  • 74
Dasphillipbrau
  • 524
  • 2
  • 8
  • 17
  • Is the date an index? – Umar.H Dec 09 '19 at 15:14
  • Check out [`to_period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.to_period.html) – Chris Adams Dec 09 '19 at 15:17
  • Just as an alternative, since you can extract `year` and `month` individually, you can go `groupby(['year', 'month'])`. It gives you more latitude when it comes to visualizations. – union77 Jun 06 '22 at 03:25

5 Answers5

18

You can use to_period

df['month_year'] = df['date'].dt.to_period('M')
ATL
  • 521
  • 3
  • 8
6

If they are stored as datetime you should be able to create a string with just the year and month to group by using datetime.strftime (https://strftime.org/).

It would look something like:

df['ym-date'] = df['date'].dt.strftime('%Y-%m')
JDenman6
  • 329
  • 2
  • 8
  • 1
    This code snippet will not work, it will throw an AttributeError, you need to do: `df['ym-date'] = df['Date'].dt.strftime('%Y-%m')` – Varun Balupuri Dec 09 '19 at 15:21
2

If you have some data that uses datetime values, like this:

sale_date = [
    pd.date_range('2017', freq='W', periods=121).to_series().reset_index(drop=True).rename('Sale Date'),
    pd.Series(np.random.normal(1000, 100, 121)).rename('Quantity')
]
sales = pd.concat(data, axis='columns')

Sample data

You can group by year and date simultaneously like this:

d = sales['Sale Date']
sales.groupby([d.dt.year.rename('Year'), d.dt.month.rename('Month')]).sum()

enter image description here

You can also create a string that represents the combination of month and year and group by that:

ym_id = d.apply("{:%Y-%m}".format).rename('Sale Month')
sales.groupby(ym_id).sum()

enter image description here

stevepastelan
  • 1,264
  • 7
  • 11
1

A couple of options, one is to map to the first of each month:

Assuming your dates are in a column called 'Date', something like:

df['Date_no_day'] = df['Date'].apply(lambda x: x.replace(day=1))

If you are really keen on storing the year and month only, you could map to a (year, month) tuple, eg:

df['Date_no_day'] = df['Date'].apply(lambda x: (x.year, x.month))

From here, you can groupby/aggregate by this new column and perform your analysis

Varun Balupuri
  • 363
  • 5
  • 17
0

One way could be to transform the column to get the first of month for all of these dates and then create your analsis on month to month:

date_col = pd.to_datetime(['2011-09-30', '2012-02-28'])
new_col = date_col + pd.offsets.MonthBegin(1)

Here your analysis remains intact as monthly

Sushant
  • 511
  • 2
  • 13