1

Is there a better way of adding a single label to a legend for a set of boxplots?

Below is a simple worked example that gives the desired result. This is done my creating an invisible line (alpha=0) with the desired label, then changing the alpha via the legendHandles. however can a single label for all the boxplots just be passed to sns.boxplot()?

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Get the tips dataset and select a subset as an example
tips = sns.load_dataset("tips")
variable_to_bin_by = 'tip'
binned_variable = 'total_bill'
df = tips[ [binned_variable,  variable_to_bin_by] ]  

# Group the data by a list of bins
bins = np.array([0, 1, 2, 3, 4])
gdf = df.groupby( pd.cut(df[variable_to_bin_by].values, bins ) )
data = [ i[1][binned_variable].values for i in gdf]
df = pd.DataFrame( data, index = bins[:-1])   

# Plot the data (using boxplots to show spread of real values)
fig, ax = plt.subplots()
ax = sns.boxplot( data=df.T, ax=ax, color='k')

# Create hidden line with the extra label (to give label to boxplots)
x = np.range(10)
plt.plot(x, x, label='REAL DATA', color='k', alpha=0)

# Now plot some "model fit" lines
models = {'model1': bins+10, 'model2': bins+10*1.5, 'model3': bins*10}
for key in sorted( models.keys() ):
    plt.plot( bins, models[key], label=key )

# Add a legend
leg = plt.legend()

# Update line visibility (alpha)
for legobj in leg.legendHandles:
        legobj.set_alpha( 1 )

# Show the plot
plt.show()

Although this gives the desired result (as below), my question is whether there a better way?

Success!

tsherwen
  • 1,076
  • 16
  • 21

1 Answers1

2

Instead of using a line that has some data, which would then need to be made invisible in the plot and then visible in the legend, you may directly create an empty line with the properties you want to show in the legend (here, the color).

plt.plot([], [], label='REAL DATA', color='k')

This avoids playing with the alpha in the plot and the legend. The complete example would then look like:

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Get the tips dataset and select a subset as an example
tips = sns.load_dataset("tips")
variable_to_bin_by = 'tip'
binned_variable = 'total_bill'
df = tips[ [binned_variable,  variable_to_bin_by] ]  

# Group the data by a list of bins
bins = np.array([0, 1, 2, 3, 4])
gdf = df.groupby( pd.cut(df[variable_to_bin_by].values, bins ) )
data = [ i[1][binned_variable].values for i in gdf]
df = pd.DataFrame( data, index = bins[:-1])   

# Plot the data (using boxplots to show spread of real values)
fig, ax = plt.subplots()
ax = sns.boxplot( data=df.T, ax=ax, color="grey")

# Create hidden line with the extra label (to give label to boxplots)
plt.plot([], [], label='REAL DATA', color='k')

# Now plot some "model fit" lines
models = {'model1': bins+10, 'model2': bins+10*1.5, 'model3': bins*10}
for key in sorted( models.keys() ):
    plt.plot( bins, models[key], label=key, zorder=3)

# Add a legend
leg = plt.legend()

# Show the plot
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thanks for the input. I guess I am really trying to ask if there is a way to pass the label to ``sns.boxplot()`` and for ``plt.legend()`` to find that. I checked and I do already use the blank line creation you suggest ([and as shown by this answer](https://stackoverflow.com/a/45220580/2543267)) elsewhere in my code, I just forgot to use that when I quickly threw together the question. – tsherwen Jan 31 '18 at 10:21
  • Even the matplotlib boxplot function does not have the option to specify legend labels. Because seaborn essentially just calls the matplotlib boxplot function, it does not provide that option either. In that sense the answer is simply: No! – ImportanceOfBeingErnest Jan 31 '18 at 12:51