1

I am looking to groupby a subset of column names to create boxplots in python pandas.

I have the following dataset:

local_term_1year | regional_term_1year | local_term_2year | regional_term_2year      
-------------------------------------------------------------------------------
30               | 30                  | 40               | 50 
20               | 40                  | 50               | 60

I am hoping to create two grouped boxplot graphs, one for 1year and another for 2year. Also if possible I'd like to color each boxplot according to the local/regional tag.

So far I have been able to extract the suffix and prefix from each column into a separate table:

column              | year  | region
---------------------------------------
local_term_1year    | 1year | local
regional_term_1year | 1year | regional
local_term_2year    | 2year | local
regional_term_2year | 2year | regional 

I am not sure if this additional dataframe will help with the boxplot.

GrandmasLove
  • 465
  • 1
  • 4
  • 14

1 Answers1

1

Assuming your pd.DataFrame is called df, we can do this the following way:

new_df = df.melt(var_name='col', value_name='table_value')
new_df['region'] = new_df.col.str.split('_').str.get(0)
new_df['year'] = new_df.col.str.split('_').str.get(-1)

We can then use seaborn.boxplot to create the boxplot you asked for:

import seaborn as sns
sns.boxplot(data=new_df, x='year', y='table_value', hue='region')

DISCLAIMER: I have not tested this code - if you provide a Minimal, Complete, and Verifiable example, I can test it but it should work as is. There is a particularly helpful guide for how to create such an example for a pandas question, here.

tobsecret
  • 2,442
  • 15
  • 26