1

My data frames are:

df['graph_df_uni_valid']

             group  MO_SNCE_REC_APP     Label  predictions
0  (-0.001, 25.0]            24324  0.042551     0.042118
1    (25.0, 45.0]            24261  0.035077     0.033748
2    (45.0, 64.0]            23000  0.033391     0.033354
3    (64.0, 83.0]            22960  0.028876     0.028351
4   (83.0, 118.0]            23725  0.028872     0.029056
5  (118.0, 174.0]            23354  0.021024     0.022121
6            miss                0  0.009165     0.008978

df['graph_df_uni_oot']

             group  MO_SNCE_REC_APP     Label  predictions
0  (-0.001, 25.0]            28942  0.033308     0.041806
1    (25.0, 44.0]            28545  0.027921     0.034701
2    (44.0, 64.0]            27934  0.026634     0.033682
3    (64.0, 83.0]            27446  0.021132     0.028101
4   (83.0, 119.0]            28108  0.022236     0.028721
5  (119.0, 171.0]            27812  0.015892     0.020897
6            miss                0  0.007614     0.009352

Issue is x-axis of Test (& OOT) plot is not in sequential order i.e. bin (11.0 – 102.0] should be the last, NOT 2nd in sequence. enter image description here

My data is in correct sequence so I used sort=False for pointplot (or lineplot) and order=df['graph_df_uni_valid'].sort_values(by='group').group for barplot. But I get same unordered x-axis with/without these parameters.

Here is my code:

    fig, ax = plt.subplots(nrows = 1, ncols = 2, figsize = (12,5), sharex = False, sharey = True, tight_layout = True)
    fig.supxlabel(desc, ha = 'center', wrap = True)
    fig.suptitle(f"{col} (Rank #{rank}, TotGain: {totgain}, Cum TotGain: {cumtotgain})", fontsize = 16)
  
    ax1_line = ax[0].twinx()
    ax2_line = ax[1].twinx()

   
    
    ax2_line.get_shared_y_axes().join(ax1_line,ax2_line)

    ax[0] = sns.barplot(data = df['graph_df_uni_valid'], ax = ax[0], x = 'group', y = col, color = 'blue', order=df['graph_df_uni_valid'].sort_values(by='group').group)
    ax[0].set(xlabel = '', ylabel = 'Count')
    ax[0].tick_params(axis = 'x', rotation = 60)

    ax1_line = sns.pointplot(data = df['graph_df_uni_valid'], ax = ax1_line, x = 'group', y = target, sort= False, color = 'red', marker = '.')    
    ax1_line = sns.pointplot(data = df['graph_df_uni_valid'], ax = ax1_line, x = 'group', y = sc, sort= False, color = 'green', marker = '.')
    ax1_line.set(xlabel = '', ylabel = 'Book Rate/Score')
    ax[0].set_title('Test (202205 - 202208)')

    ax[1] = sns.barplot(data = df['graph_df_uni_oot'], ax = ax[1], x = 'group', y = col, color = 'blue', order=df['graph_df_uni_oot'].sort_values(by='group').group)
    ax[1].set(xlabel = '', ylabel = 'Count')
    ax[1].tick_params(axis = 'x', rotation = 60)

    ax2_line = sns.pointplot(data = df['graph_df_uni_oot'], x = 'group', y = target, sort= False, color = 'red', marker = '.')
    ax2_line = sns.pointplot(data = df['graph_df_uni_oot'], ax = ax2_line, x = 'group', y = sc, sort=False, color = 'green', marker = '.')
    ax2_line.set(xlabel = '', ylabel = 'Book Rate/Score')    
    ax[1].set_title('OOT (202204)')

If I change barplot parameter order=df['graph_df_uni_valid'].index, I get desired x-axis sequence but bars disappears. enter image description here

versions

  • matplotlib 3.4.0
  • seaborn 0.10.0

2nd Question How to add legend that red line is 'Book rate', green line is 'Score' & blue bars are volume

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
nads
  • 71
  • 1
  • 7

2 Answers2

2
  • Aggregating the data with .groupby is not necessary.

    • While not shown in the OP, the shape of the sample, indicates it was used.
    • sns.barplot and sns.pointplot both have the estimator parameter for setting the type of statistical function to use for aggregation. The default is 'mean'.
      • If there is aggregation, there will be errorbars, which can be removed with the errorbar parameter (ci in older versions).
  • Add a column with pd.cut, which creates categorically ordered bins, ordered=True, by default.

    • Since they are ordered, the x-axis will be ordered.
  • Legends:

    • Add labels for plots on ax1 and ax1y
    • Get the handles and labels
    • Delete the axes legend
    • Create a figure legend with the combined handles and labels
  • Tested in python 3.11.2, pandas 2.0.1, matplotlib 3.7.1, seaborn 0.12.2

import seaborn as sns
import matplotlib.pyplot as plt

# create the dataframe
df = sns.load_dataset('geyser')

# create the categorically ordered groups
df['group'] = pd.cut(df.duration, bins=np.arange(1.6, 5.2, 0.5), ordered=True)

# create the figure and axes
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 5), sharex=False, sharey=True, tight_layout=True)
ax1y = ax1.twinx()
ax2y = ax2.twinx()

# select the data for ax1
long = df[df.kind.eq('long')]

# plot
sns.barplot(data=long, x='group', y='duration', ax=ax1, color='tab:blue', label='Duration', errorbar=None)
sns.pointplot(data=long, x='group', y='waiting', ax=ax1y, color='tab:red', label='Waiting', errorbar=None)

ax1.set(title='Geyser: short wait time and duration')

# create the legends on ax1 and ax1y
ax1.legend()
ax1y.legend()

# get the legend handles and labels
h1, l1 = ax1.get_legend_handles_labels()
h1y, l1y = ax1y.get_legend_handles_labels()

# remove the axes legend
ax1.get_legend().remove()
ax1y.get_legend().remove()

# add a figure legend from the combined handles and labels
fig.legend(h1 + h1y, l1 + l1y, loc='lower center', ncols=2, bbox_to_anchor=(0.5, 0), frameon=False)

# select the data for ax2
short = df[df.kind.eq('short')]

# plot
sns.barplot(data=short, x='group', y='duration', ax=ax2, color='tab:blue', errorbar=None)
sns.pointplot(data=short, x='group', y='waiting', ax=ax2y, color='tab:red', errorbar=None)

_ = ax2.set(title='Geyser: long wait time and duration')

enter image description here

df.head()

   duration  waiting   kind       group
0     3.600       79   long  (3.1, 3.6]
1     1.800       54  short  (1.6, 2.1]
2     3.333       74   long  (3.1, 3.6]
3     2.283       62  short  (2.1, 2.6]
4     4.533       85   long  (4.1, 4.6]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Thanks @trenton. I posted my answer which solved my x-axis ordering issue and changed y label to include color for now. Your legend method is not working for me, probably b/c of version conflict. – nads May 25 '23 at 20:21
  • @nads figure legends were added in matplotlib 3.7. You can still create the legend as shown. But instead of using `fig.legend(...)`, use `ax1.legend(...)`, but you'll have to change `loc=` and `bbox_to_anchor=` to get it into the correct position. – Trenton McKinney May 25 '23 at 20:25
0

As my data was in correct sequence so I just have to use sort=False for pointplot (or lineplot) and no order parameter for barplot. I get x-axis in correct order.

enter image description here

nads
  • 71
  • 1
  • 7