2

I've created a pivot table with two levels of columns

pivotCust = bigData.pivot_table(index=['month'],columns=['year','usertype'],values='start_time',aggfunc = 'count')

This creates the table that I'm interested in:

year    2019    2020    2021
usertype    casual  member  casual  member  casual  member
month                       
1   4602    98670   7785    136099  18117   78717
2   2638    93548   12870   126715  10131   39491
3   15923   149688  27825   115593  84033   144463
4   47744   217566  23628   61148   136601  200629
5   81624   285834  86909   113365  256916  274717
6   130218  345177  154718  188287  370681  358914
7   175632  381683  269296  282184  442056  380354
8   186889  403295  289661  332700  412671  391681
9   129173  364046  230692  302266  363890  392257
10  71035   300751  145012  243641  257242  373984
11  18729   158447  88099   171617  106929  253049
12  16430   138662  30080   101493  69738   177802

But when I try to turn it into a bar graph (with the code below), it's hard to read, as it creates 72 columns -- six entries per month (casual/member * 3 years), for 12 months. Graph with six entries per month

pivotCust.plot(kind = 'bar',figsize=(17,10))

I'd like to turn this into a stacked graph, with three columns per month (1 per year) and the casual/member data in a stacked bar. But when I use the 'stacked = True' flag, I get a graph of 12 columns, with all the data stacked together.

pivotCust.plot(kind = 'bar',stacked = True, figsize=(17,10))

I think .melt or .unstack might be what I need to use to fix this, but I can't figure out how to use it correctly.

The answer here suggests that Seaborn might be useful, but, again, I can't figure out how to get it to produce the graph I desire.

Any suggestions would be greatly appreciated.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Feikname
  • 119
  • 1
  • 6

1 Answers1

1

There might be an easier approach, but I think the difficulty comes from the fact that you want to group your columns by month, stratified by year, and then further stratified by usertype. Seaborn boxplot makes it easy to stratify by one level using hue, but I don't know how to stratify by 2 levels like you need here.

Instead, as a hack I'm first plotting the sum of both user types, and then plotting just the member values on top. I'd argue that a lineplot would be easier to interpret. I've included one below the code.

I also melted your table to make seaborn happier

enter image description here

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import io #just used for reading in the example table

pivotCust = pd.read_csv(io.StringIO("""
1   4602    98670   7785    136099  18117   78717
2   2638    93548   12870   126715  10131   39491
3   15923   149688  27825   115593  84033   144463
4   47744   217566  23628   61148   136601  200629
5   81624   285834  86909   113365  256916  274717
6   130218  345177  154718  188287  370681  358914
7   175632  381683  269296  282184  442056  380354
8   186889  403295  289661  332700  412671  391681
9   129173  364046  230692  302266  363890  392257
10  71035   300751  145012  243641  257242  373984
11  18729   158447  88099   171617  106929  253049
12  16430   138662  30080   101493  69738   177802"""
),delim_whitespace=True, header=None, index_col=0)

pivotCust.index.name = 'month'

pivotCust.columns = pd.MultiIndex.from_product([
    [2019,2020,2021],
    ['casual','member'],
], names=['year','usertype'])


plot_df = pivotCust.reset_index().melt(id_vars='month')
plot_df['casual_member_sum'] = plot_df.groupby(['month','year'])['value'].transform('sum')

fig,ax = plt.subplots()

#Plot the sum of the two categories as background bars
sns.barplot(
    x = 'month',
    y = 'casual_member_sum',
    palette = 'Blues',
    hue = 'Total '+plot_df['year'].astype(str),
    ax = ax,
    data = plot_df,
)

#Plot just the members as foreground bars
sns.barplot(
    x = 'month',
    y = 'value',
    palette = 'Reds',
    hue = 'Member '+plot_df['year'].astype(str),
    ax = ax,
    data = plot_df[plot_df['usertype'].eq('member')],
)

plt.show()
plt.close()

Here's the lineplot approach with seaborn using the same plot_df created above. The lineplot is easy to make too

enter image description here

sns.lineplot(
    x = 'month',
    y = 'value',
    hue = 'year',
    style = 'usertype',
    data = plot_df,
)
plt.show()
plt.close()
mitoRibo
  • 4,468
  • 1
  • 13
  • 22
  • Thanks for the help! Your bar graph is much better than mine, but it's still pretty confusing. I think I will just separate the information into two separate graphs, one for "members" and the other for "casual". Easier to see how the groups vary over time, but harder to compare them to each other. – Feikname Aug 19 '22 at 02:43