1

I'm struggling with two small formatting issues with my Seaborn boxplot and histogram (plotted as subplots).

  1. The colors between the two subplots are slightly different even though the coded colors are exactly the same.
  2. I'm trying to rearrange the order of the legend so 'Group A' appears above 'Group B'
groupA = [94, 74, 65, 36, 32, 65, 56, 59, 24, 133, 16, 8, 18]
groupB = [1, 1, 1, 1, 2, 7, 7, 10, 15, 16, 17, 17, 19, 29, 31, 32, 43, 43, 44, 47, 56, 64, 64, 80, 81, 87, 103, 121, 121, 121, 187, 197, 236, 292, 319, 8, 12, 12, 14, 14, 15, 16, 16, 20, 20, 33, 36, 37, 37, 44, 46, 48, 51, 51, 54, 57, 72, 74, 95, 103, 103, 107, 134, 199, 216, 228, 254]
f, (ax_boxplot, ax_histogram) = plt.subplots(2, sharex=True, gridspec_kw={'height_ratios': (0.3,0.7)}, figsize=(10,10))
sns.boxplot(data=[groupA, groupB], ax=ax_boxplot, orient='h', palette=['green', 'silver'])
ax_boxplot.tick_params(axis='y', left=False, labelleft=False)
sns.histplot(data=[groupA, groupB], bins=34, binrange=(0,340), palette=['green', 'silver'], alpha=1, edgecolor='black')
ax_histogram.tick_params(axis='both', labelsize=18)
ax_histogram.legend(labels=['groupB', 'groupA'], fontsize=16, frameon=False)
plt.xlabel("Days", fontsize=24, labelpad=20)
plt.ylabel("Count", fontsize=24, labelpad=20)
sns.despine()

What I have tried so far:

  • For the colors: I tried setting the alpha to 1 in the histogram, but there still seems to be a slight difference.
  • For the legend: Tried playing around with hue_order and handles, but didn't have any luck getting that to work.

Image of histogram and boxplot subplots

KatC
  • 13
  • 3

3 Answers3

2

Try using saturation=1 in your call to boxplot. Unless specified, saturation is equal to 0.75.

The documentation says:

saturation float, optional

Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to 1 if you want the plot colors to perfectly match the input color.

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
2

Your coloring issue appears commonly on StackOverflow. E.g. Avoid Seaborn barplot desaturation of colors, Seaborn chart colors are different from those specified by palette or Inconsistent colours from custom seaborn palette. Seaborn's author likes desaturated color for rectangles, so this is enabled by default.

Seaborn creates its own legends, which often differ from what you get by calling matplotlib's ax.legend(...). To change the parameters of the legend, Seaborn has a sns.move_legend() function. move_legend is primarily meant to change the position, but you can also change other parameters (except the item labels). As the "new" position is a required parameter, you can use loc='best', which is the default.

For the labels in the legend, Seaborn's usual way is a "long form" dataframe where one column is used as hue=. But Seaborn also support a dictionary as data. Then, the labels of the dictionary serve as legend labels.

Note that unless you add sns.histplot(..., multiple='stack') (or multiple='dodge'), the bars of the last drawn histogram will hide (partially or totally) the bars of the other histogram. That can be very confusing (that's why by default some transparency is set).

import matplotlib.pyplot as plt
import seaborn as sns

groupA = [94, 74, 65, 36, 32, 65, 56, 59, 24, 133, 16, 8, 18]
groupB = [1, 1, 1, 1, 2, 7, 7, 10, 15, 16, 17, 17, 19, 29, 31, 32, 43, 43, 44, 47, 56, 64, 64, 80, 81, 87, 103, 121, 121, 121, 187, 197, 236, 292, 319, 8, 12, 12, 14, 14, 15, 16, 16, 20, 20, 33, 36, 37, 37, 44, 46, 48, 51, 51, 54, 57, 72, 74, 95, 103, 103, 107, 134, 199, 216, 228, 254]
f, (ax_boxplot, ax_histogram) = plt.subplots(2, sharex=True,
                                             gridspec_kw={'height_ratios': (0.3, 0.7)}, figsize=(10, 10))

sns.boxplot(data=[groupA, groupB], ax=ax_boxplot, orient='h',
            palette=['green', 'silver'], saturation=1)
ax_boxplot.tick_params(axis='y', left=False, labelleft=False)

sns.histplot(data={'Group A': groupA, 'Group B': groupB},
             bins=34, binrange=(0, 340),
             palette=['green', 'silver'], alpha=1, edgecolor='black')
ax_histogram.tick_params(axis='both', labelsize=18)
ax_histogram.set_xlabel("Days", fontsize=24, labelpad=20)
ax_histogram.set_ylabel("Count", fontsize=24, labelpad=20)
sns.move_legend(ax_histogram, loc='best', fontsize=24, frameon=False)
sns.despine()

plt.show()

sns.histplot with changed legend

JohanC
  • 71,591
  • 8
  • 33
  • 66
1
  • As mentioned by @JohanC, putting the data into a long-form dataframe has the benefit of allowing seaborn to automatically add the labels, and deal with the order.
  • pd.DataFrame(data=v, columns=['Days']).assign(Group=group) is used to create a dataframe for each list, where .assign creates a column called 'Group' for the name of the data. The two dataframes are combined with pd.concat.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# set the matplotlib rc parameters (global settting)
params = {'axes.labelsize': 24,
          'axes.titlesize': 24,
          'axes.labelpad': 20,
          'axes.spines.top': False,
          'axes.spines.right': False,
          'ytick.labelsize': 18,
          'xtick.labelsize': 18,
          'legend.fontsize': 20,
          'legend.frameon': False,
          'legend.title_fontsize': 16}

plt.rcParams.update(params)

# create the dataframe with a column defining the groups
df = pd.concat([pd.DataFrame(data=v, columns=['Days']).assign(Group=group) for v, group in zip([groupA, groupB], ['A', 'B'])], ignore_index=True)

# create the figure and axes
fig, (ax_boxplot, ax_histogram) = plt.subplots(2, sharex=True, gridspec_kw={'height_ratios': (0.3,0.7)}, figsize=(10,10))

# plot the histplot from df
sns.histplot(data=df, x='Days', hue='Group', bins=34, binrange=(0,340), palette=['green', 'silver'], alpha=1, edgecolor='black', ax=ax_histogram)

# plot the boxplot from df
sns.boxplot(data=df, x='Days', y='Group', ax=ax_boxplot, palette=['green', 'silver'])
ax_boxplot.tick_params(axis='y', left=False, labelleft=False, bottom=False)
_ = ax_boxplot.set(xlabel='', ylabel='')

enter image description here

df.head(15)

    Days Group
0     94     A
1     74     A
2     65     A
3     36     A
4     32     A
5     65     A
6     56     A
7     59     A
8     24     A
9    133     A
10    16     A
11     8     A
12    18     A
13     1     B
14     1     B
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    Nice. I just noticed that seaborn accepts the "wide form" dictionary for the `histplot` but not (yet) for the `boxplot`. It's part of Michael's ongoing refactoring efforts. Personally, I feel a dictionary has the benefit of looking less cryptic to new users and unlike a wide dataframe, uneven column lengths aren't a problem. – JohanC Apr 27 '23 at 21:05
  • @JohanC Thanks. I personally prefer, and recommend, using a DataFrame, because a visualization should go hand-in-hand with analysis, and a DataFrame can’t be beat for making analysis easier. – Trenton McKinney Apr 27 '23 at 21:19
  • 1
    Indeed, a dictionary is only suited for simple use cases, a few 1D data sets without extra information. Once the data to be studied gets more elaborated, a simple dictionary won't suffice. – JohanC Apr 27 '23 at 22:21