3

I'm having a difficult time trying to create a bar plot with and DataFrame grouped by year and month. With the following code I'm trying to plot the data in the created image, instead of that, is returning a second image. Also I tried to move the legend to the right and change its values to the corresponding month.

I started to get a feel for the DataFrames obtained with the groupby command, though not getting what I expected led me to ask you guys.

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.read_csv('fcc-forum-pageviews.csv', index_col='date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]
fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')
plt.show()

enter image description here enter image description here

This is the format of the data that I am analyzing.

date,value
2016-05-09,1201
2016-05-10,2329
2016-05-11,1716
2016-05-12,10539
2016-05-13,6933
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

2 Answers2

3
  1. Add a sorted categorical 'month' column with pd.Categorical
  2. Transform the dataframe to a wide format with pd.pivot_table where aggfunc='mean' is the default.
    • Wide format is typically best for plotting grouped bars.
  3. pandas.DataFrame.plot returns matplotlib.axes.Axes, so there's no need to use fig, ax = plt.subplots(figsize=(10,10)).
  4. The pandas .dt accessor is used to extract various components of 'date', which must be a datetime dtype
    • If 'date' is not a datetime dtype, then transform it with df.date = pd.to_datetime(df.date).
  5. Tested with python 3.8.11, pandas 1.3.1, and matplotlib 3.4.2

Imports and Test Data

import pandas as pd
from calendar import month_name  # conveniently supplies a list of sorted month names or you can type them out manually
import numpy as np  # for test data

# test data and dataframe
np.random.seed(365)
rows = 365 * 3
data = {'date': pd.bdate_range('2021-01-01', freq='D', periods=rows), 'value': np.random.randint(100, 1001, size=(rows))}
df = pd.DataFrame(data)

# select data within specified quantiles
df = df[df.value.gt(df.value.quantile(0.025)) & df.value.lt(df.value.quantile(0.975))]

# display(df.head())
        date  value
0 2021-01-01    694
1 2021-01-02    792
2 2021-01-03    901
3 2021-01-04    959
4 2021-01-05    528

Transform and Plot

  • If 'date' has been set to the index, as stated in the comments, use the following:
    • df['months'] = pd.Categorical(df.index.strftime('%B'), categories=months, ordered=True)
# create the month column
months = month_name[1:]
df['months'] = pd.Categorical(df.date.dt.strftime('%B'), categories=months, ordered=True)

# pivot the dataframe into the correct shape
dfp = pd.pivot_table(data=df, index=df.date.dt.year, columns='months', values='value')

# display(dfp.head())
months  January  February  March  April    May   June   July  August  September  October  November  December
date                                                                                                        
2021      637.9     595.7  569.8  508.3  589.4  557.7  508.2   545.7      560.3    526.2     577.1     546.8
2022      567.9     521.5  625.5  469.8  582.6  627.3  630.4   474.0      544.1    609.6     526.6     572.1
2023      521.1     548.5  484.0  528.2  473.3  547.7  525.3   522.4      424.7    561.3     513.9     602.3

# plot
ax = dfp.plot(kind='bar', figsize=(12, 4), ylabel='Mean Page Views', xlabel='Year', rot=0)
_ = ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
2

Just pass the ax you defined to pandas:

bar_plot.plot(ax = ax, kind='bar')

If you also want to replace months numbers with names, you have to get those labels, replace numbers with names and re-define the legend by passing to it the new labels:

handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))

Complete Code

import pandas as pd
from matplotlib import pyplot as plt
import datetime

df = pd.read_csv('fcc-forum-pageviews.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
line_plot = df.value[(df.value > df.value.quantile(0.025)) & (df.value < df.value.quantile(0.975))]

fig, ax = plt.subplots(figsize=(10,10))
bar_plot = line_plot.groupby([line_plot.index.year, line_plot.index.month]).mean().unstack()
bar_plot.plot(ax = ax, kind='bar')
ax.set_xlabel('Years')
ax.set_ylabel('Average Page Views')

handles, labels = ax.get_legend_handles_labels()
new_labels = [datetime.date(1900, int(monthinteger), 1).strftime('%B') for monthinteger in labels]
ax.legend(handles = handles, labels = new_labels, loc = 'upper left', bbox_to_anchor = (1.02, 1))

plt.show()

enter image description here

(plot generated with fake data)

Zephyr
  • 11,891
  • 53
  • 45
  • 80