How can we loop through items in a dataframe and create a bar charts for each 'group' of items?

Question

I have a dataframe that looks like this.

import pandas as pd
# Intitialise data of lists
data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-02-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-03-01', 'Expense':7000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-04-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-01-01', 'Expense':5000, 'Revenue':6000, 'Building':'Casino'}, 
       {'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'Casino'},
       {'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'Casino'},
       {'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Casino'}]
df = pd.DataFrame(data)
df

How can I create a loop to iterate over all 'groups' of places? A group is Stadium, Casino, etc. I'm trying to get a time series of revenue and expense data, either stacked with on top of one the other, or one next to the other. I took a crack at it and came up with this hackey-code, but of course it doesn't work, or I wouldn't be asking my question here.

import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, figsize=(12,3))

grouped = df.groupby('Building')

for i, group in grouped:
    #print(i)
    plt.figure()
    df.loc['Month'].hist(column='Month',
                                   bins=10,
                                   ax=axes[i],
                                   sharey=True)
    axes[i].set_title('Year in {0}'.format())
    axes[i].set_xlabel('Value')
    axes[i].set_ylabel('Count')
    plt.show()
plt.show()

I just updated my question with the df.groupby code. The charting/plotting still doesn't work for me. — ASH, Aug 29 '21 at 21:49
I am a bit confused as to why you are using a histogram. It seems more like a bar chart organized by month with each subplot being a different building type is what you are after? — Derek O, Aug 29 '21 at 22:06
`dfm = df.melt(id_vars=['Month', 'Building'], value_vars=['Expense', 'Revenue'])` and then `sns.catplot(kind='bar', data=dfm, x='Month', y='value', hue='variable', col='Building')` — Trenton McKinney, Aug 29 '21 at 22:19
For a lineplot, convert `'Month'` to a datetime dtype with `df.Month = pd.to_datetime(df.Month).dt.date` , do `df.melt` and plot `sns.relplot(kind='line', data=dfm, x='Month', y='value', hue='variable', col='Building', aspect=1.5)` — Trenton McKinney, Aug 29 '21 at 22:22
@Derek, you are right. I started off in the wrong direction, pivoted, and then got to where I really wanted to be. — ASH, Aug 30 '21 at 12:52
@Trenton, thanks. This is exactly what I want to do. My initial approach was a bit out of alignment. Thanks for helping me get this straightened out! — ASH, Aug 30 '21 at 12:53

Derek O · Accepted Answer · 2021-08-29T22:30:31.883

I think that a stacked bar chart for expenses and revenues over the months should work - a histogram shouldn't be needed unless you are summing expenses and revenues up over a period of time longer than a month. You can have two subplots of stacked bar charts for the different building types, with a shared y-axis between the subplots.

import pandas as pd
import matplotlib.pyplot as plt

# Intitialise data of lists
data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-02-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-03-01', 'Expense':7000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-04-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-01-01', 'Expense':5000, 'Revenue':6000, 'Building':'Casino'}, 
       {'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'Casino'},
       {'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'Casino'},
       {'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Casino'}]
df = pd.DataFrame(data)

n_cols = 2
# n_bins = 10
fig, axes = plt.subplots(1, n_cols, sharey=True, figsize=(12,4))

grouped = df.groupby('Building')

for i, (group_name, group) in enumerate(grouped):
       axes[i].bar(group['Month'], group['Expense'], label="Expense")
       axes[i].bar(group['Month'], group['Revenue'], label="Revenue", bottom=group['Expense'])
       axes[i].set_title(f"Revenue and Expense for {group_name}")
       axes[i].set_xlabel('Month')
       axes[i].set_ylabel('Count')
       axes[i].legend()
              
plt.show()

How can we loop through items in a dataframe and create a bar charts for each 'group' of items?

1 Answers1

Linked