2

I have data like this that I want to plot by month and year using matplotlib.

df = pd.DataFrame({'date':['2018-10-01', '2018-10-05', '2018-10-20','2018-10-21','2018-12-06',
                            '2018-12-16', '2018-12-27', '2019-01-08','2019-01-10','2019-01-11',
                            '2019-01-12', '2019-01-13', '2019-01-25', '2019-02-01','2019-02-25', 
                            '2019-04-05','2019-05-05','2018-05-07','2019-05-09','2019-05-10'],
                  'counts':[10,5,6,1,2,
                            5,7,20,30,8,
                            9,1,10,12,50,
                            8,3,10,40,4]})

First, I converted the datetime format, and get the year and month from each date.

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

Then, I tried to do groupby like this.

aggmonth = df.groupby(['year', 'month']).sum()

And I want to visualize it in a barchart or something like that. But as you notice above, there are missing months in between the data. I want those missing months to be filled with 0s. I don't know how to do that in a dataframe like this. Previously, I asked this question about filling missing dates in a period of data. where I converted the dates to period range in month-year format.

by_month = pd.to_datetime(df['date']).dt.to_period('M').value_counts().sort_index()
by_month.index = pd.PeriodIndex(by_month.index)

df_month = by_month.rename_axis('month').reset_index(name='counts')
df_month

idx = pd.period_range(df_month['month'].min(), df_month['month'].max(), freq='M')
s = df_month.set_index('month').reindex(idx, fill_value=0)
s

But when I tried to plot s using matplotlib, it returned an error. It turned out you cannot plot a period data using matplotlib.

So basically I got these two ideas in my head, but both are stuck, and I don't know which one I should keep pursuing to get the result I want.

What is the best way to do this? Thanks.

catris25
  • 1,173
  • 3
  • 20
  • 40

1 Answers1

3

Convert the date column to pandas datetime series, then use groupby on monthly period and aggregate the data using sum, next use DataFrame.resample on the aggregated dataframe to resample using monthly frequency:

df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.to_period('M')).sum()
df1 = df1.resample('M').asfreq().fillna(0)

Plotting the data:

df1.plot(kind='bar')

enter image description here

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53