How to write a function to plot a variable over time after manipulating the dataframe with pivot_table and MultiIndexes

Question

The following code runs perfectly and creates a dataframe

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import matplotlib as mpl

sns.set()

df = pd.DataFrame({ 

    # some ways to create random data
    'scenario':np.random.choice( ['BAU','ETS','ESD'], 27),
    'region':np.random.choice( ['Italy','France'], 27),
    'variable':np.random.choice( ['GDP','GHG'], 27),
    # some ways to create systematic groups for indexing or groupby
    # this is similar to r's expand.grid(), see note 2 below
    '2015':np.random.randn(27),
    '2016':np.random.randn(27),
    '2017':np.random.randn(27),
    '2018':np.random.randn(27),
    '2019':np.random.randn(27),
    '2020':np.random.randn(27),
    '2021':np.random.randn(27)
    })

df2=pd.melt(df,id_vars=['scenario','region','variable'],var_name='year')
all_names_index = df2.set_index(['scenario','region','variable','year']).sort_index()

Then I use a function to plot iteratively:

def name_plot(scenario, region, variable):
    data = all_names_index.loc[scenario, region, variable]
    plt.plot(data.index, data.value, label='%s' % scenario)

font = {'family' : 'normal',
        'weight' : 'bold',
        'size'   : 13}
plt.rc('font', **font)
names = ['BAU','ETS', 'ESD']
for scenario in names:
    name_plot(scenario, 'Italy', 'GHG')
    plt.xlabel('Years')
    plt.ylabel('MtCO2e')
    plt.title('Emissions Pathways')
    plt.legend() 
    plt.savefig('EMIp.png')
plt.clf()

As I need to create a region EU as the sum of the othe two countries, I create a pivot_table:

map_eu = {
        'EU' : ['Italy','France']
}
df3=pd.pivot_table(df2, 'value', ['scenario', 'variable', 'year'], 'region')
for k,v in map_eu.items():
        df3[k] = df3[v].sum(1)
df3 = df3.stack(0).unstack(1)
df3.sort_index(0,inplace=True)

How can I plot df3 with the function name_plot defined before? I cannot understand how to re-arrange the pivot_table to obtain the same structure of df2

Please post actual [reproducible data](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [not images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557). Remember we do not have access to `data.xlsx`. — Parfait, May 18 '20 at 20:49

score 0 · Accepted Answer · answered May 19 '20 at 21:21

Consider adjusting your plotting method to receive a data frame and not rely on one in global environment (i.e., all_names_index). Also, run all plotting operations within method for better code organization. Be sure to dynamically change title and file names.

def name_plot(df, scenario, region, variable):
    data = df.loc[scenario, region, variable]

    plt.plot(data.index, data['value'], label=scenario)

    plt.xlabel('Years')
    plt.ylabel('MtCO2e')
    plt.title(f'{region} {variable} - Emissions Pathways')
    plt.legend() 

    plt.tight_layout()
    plt.savefig(f'{scenario} {region} {variable} EMP.png')
    plt.clf()

Then, melt again your pivot_table data frame:

### ITALY PLOTS
names = ['BAU', 'ETS', 'ESD']

for scenario in names:
    name_plot(all_names_index, scenario, 'Italy', 'GHG')

### EU PLOTS
df3 = pd.pivot_table(df2, 'value', ['scenario', 'variable', 'year'], 'region')

map_eu = {'EU' : ['Italy','France']}
for k,v in map_eu.items():
    df3[k] = df3.reindex(v, axis='columns').sum(1)

df3 = (df3.reset_index()
          .melt(id_vars=['scenario','variable', 'year'], 
                var_name='region')
          .set_index(['scenario', 'region', 'variable', 'year'])
      )

for scenario in names:
    name_plot(df3, scenario, 'EU', 'GHG')

How to write a function to plot a variable over time after manipulating the dataframe with pivot_table and MultiIndexes

1 Answers1