1

I have a dataframe that looks like this.

import pandas as pd
# Intitialise data of lists
data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-02-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-03-01', 'Expense':7000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-04-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-01-01', 'Expense':5000, 'Revenue':6000, 'Building':'Casino'}, 
       {'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'Casino'},
       {'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'Casino'},
       {'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Casino'}]
df = pd.DataFrame(data)
df

enter image description here

How can I create a loop to iterate over all 'groups' of places? A group is Stadium, Casino, etc. I'm trying to get a time series of revenue and expense data, either stacked with on top of one the other, or one next to the other. I took a crack at it and came up with this hackey-code, but of course it doesn't work, or I wouldn't be asking my question here.

import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, figsize=(12,3))

grouped = df.groupby('Building')

for i, group in grouped:
    #print(i)
    plt.figure()
    df.loc['Month'].hist(column='Month',
                                   bins=10,
                                   ax=axes[i],
                                   sharey=True)
    axes[i].set_title('Year in {0}'.format())
    axes[i].set_xlabel('Value')
    axes[i].set_ylabel('Count')
    plt.show()
plt.show()
ASH
  • 20,759
  • 19
  • 87
  • 200
  • I just updated my question with the df.groupby code. The charting/plotting still doesn't work for me. – ASH Aug 29 '21 at 21:49
  • df.loc['Month'] is not correct – IoaTzimas Aug 29 '21 at 22:05
  • I am a bit confused as to why you are using a histogram. It seems more like a bar chart organized by month with each subplot being a different building type is what you are after? – Derek O Aug 29 '21 at 22:06
  • 1
    `dfm = df.melt(id_vars=['Month', 'Building'], value_vars=['Expense', 'Revenue'])` and then `sns.catplot(kind='bar', data=dfm, x='Month', y='value', hue='variable', col='Building')` – Trenton McKinney Aug 29 '21 at 22:19
  • 1
    For a lineplot, convert `'Month'` to a datetime dtype with `df.Month = pd.to_datetime(df.Month).dt.date` , do `df.melt` and plot `sns.relplot(kind='line', data=dfm, x='Month', y='value', hue='variable', col='Building', aspect=1.5)` – Trenton McKinney Aug 29 '21 at 22:22
  • @Derek, you are right. I started off in the wrong direction, pivoted, and then got to where I really wanted to be. – ASH Aug 30 '21 at 12:52
  • 2
    @Trenton, thanks. This is exactly what I want to do. My initial approach was a bit out of alignment. Thanks for helping me get this straightened out! – ASH Aug 30 '21 at 12:53

1 Answers1

1

I think that a stacked bar chart for expenses and revenues over the months should work - a histogram shouldn't be needed unless you are summing expenses and revenues up over a period of time longer than a month. You can have two subplots of stacked bar charts for the different building types, with a shared y-axis between the subplots.

import pandas as pd
import matplotlib.pyplot as plt

# Intitialise data of lists
data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-02-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-03-01', 'Expense':7000, 'Revenue':5000, 'Building':'Stadium'}, 
       {'Month': '2020-04-01', 'Expense':3000, 'Revenue':4000, 'Building':'Stadium'},
       {'Month': '2020-01-01', 'Expense':5000, 'Revenue':6000, 'Building':'Casino'}, 
       {'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'Casino'},
       {'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'Casino'},
       {'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Casino'}]
df = pd.DataFrame(data)

n_cols = 2
# n_bins = 10
fig, axes = plt.subplots(1, n_cols, sharey=True, figsize=(12,4))

grouped = df.groupby('Building')

for i, (group_name, group) in enumerate(grouped):
       axes[i].bar(group['Month'], group['Expense'], label="Expense")
       axes[i].bar(group['Month'], group['Revenue'], label="Revenue", bottom=group['Expense'])
       axes[i].set_title(f"Revenue and Expense for {group_name}")
       axes[i].set_xlabel('Month')
       axes[i].set_ylabel('Count')
       axes[i].legend()
              
plt.show()

enter image description here

Derek O
  • 16,770
  • 4
  • 24
  • 43