43

How can I label each boxplot in a seaborn plot with the median value?

E.g.

import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=tips)

How do I label each boxplot with the median or average value?

user308827
  • 21,227
  • 87
  • 254
  • 417

3 Answers3

79

I love when people include sample datasets!

import seaborn as sns

sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
box_plot = sns.boxplot(x="day",y="total_bill",data=tips)

medians = tips.groupby(['day'])['total_bill'].median()
vertical_offset = tips['total_bill'].median() * 0.05 # offset from median for display

for xtick in box_plot.get_xticks():
    box_plot.text(xtick,medians[xtick] + vertical_offset,medians[xtick], 
            horizontalalignment='center',size='x-small',color='w',weight='semibold')

enter image description here

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • 2
    Note that the effect of 0.5 after medians[tick] is sensitive to the scale of one's data. For my small scale, it pushed the white text up into the white background and it took me a while to figure out why it wasn't showing. Scale 0.5 as needed. – Matt Kleinsmith Dec 06 '17 at 20:03
  • 1
    note: the `np.round(s, 2)` above can be replaced with just `s`; and moreover, the `zip()` and `get_xticklabels()` commands are unnecessary here. The trick here is that the placement of each label is determined by the median value itself (as y value), and the categorical labels (which, I guess, are represented by integers along the x axis) (as x value). Extracting the xticklabels could be helpful if the info you want to annotate with is stored in a data frame, since you could then use the xticklabels for indexing. – MMelnicki Feb 20 '19 at 19:20
  • 1
    HA! +1 for **I love when people include sample datasets!**. Me too. – Trenton McKinney Aug 22 '20 at 05:24
35

Based on ShikharDua's approach, I created a version which works independent of tick positions. This comes in handy when dealing with grouped data in seaborn (i.e. hue=parameter). Additionally, I added a flier- and orientation-detection.

grouped data with median labels in multiple formats

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patheffects as path_effects


def add_median_labels(ax, fmt='.1f'):
    lines = ax.get_lines()
    boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
    lines_per_box = int(len(lines) / len(boxes))
    for median in lines[4:len(lines):lines_per_box]:
        x, y = (data.mean() for data in median.get_data())
        # choose value depending on horizontal or vertical plot orientation
        value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
        text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
                       fontweight='bold', color='white')
        # create median-colored border around white text for contrast
        text.set_path_effects([
            path_effects.Stroke(linewidth=3, foreground=median.get_color()),
            path_effects.Normal(),
        ])


tips = sns.load_dataset("tips")

ax = sns.boxplot(data=tips, x='day', y='total_bill', hue="sex")
add_median_labels(ax)
plt.show()
Christian Karcher
  • 2,533
  • 1
  • 12
  • 17
  • Your solution is awesome and I try to figureing out the details. You access the "data" via `median.get_data()` and `median.get_xdata()`. Is there also a generalized way to get the number of values (`n`) for each box; or other values like `mean()`, `stdev()`? – buhtz Nov 22 '22 at 09:34
  • Unfortunately not. All I work is what "is visible to the eye", i.e. the coordinates of the box and its lines. Everything else is lost by that point. One way to get the statistics is to get a description by pandas in a separate step (see e.g. https://stackoverflow.com/a/59667335/9501624) – Christian Karcher Nov 22 '22 at 09:52
28

This can also be achieved by deriving median from the plot itself without exclusively computing median from data

box_plot = sns.boxplot(x="day", y="total_bill", data=tips)

ax = box_plot.axes
lines = ax.get_lines()
categories = ax.get_xticks()

for cat in categories:
    # every 4th line at the interval of 6 is median line
    # 0 -> p25 1 -> p75 2 -> lower whisker 3 -> upper whisker 4 -> p50 5 -> upper extreme value
    y = round(lines[4+cat*6].get_ydata()[0],1) 

    ax.text(
        cat, 
        y, 
        f'{y}', 
        ha='center', 
        va='center', 
        fontweight='bold', 
        size=10,
        color='white',
        bbox=dict(facecolor='#445A64'))

box_plot.figure.tight_layout()

enter image description here

ShikharDua
  • 9,411
  • 1
  • 26
  • 22
  • 2
    works great! One remark: if fliers are disabled, the interval changes from 6 to 5 (due to the missing flier-"line"). So now I have to think about a technique how to get this working for data grouped via hue values... – Christian Karcher Aug 06 '20 at 11:30
  • Can you also figuring out the `n` per box? – buhtz Nov 22 '22 at 09:49