0

I have a plot with qualitative variables on the x-axis, and vertical scattered points for each category using sns.stripplot. I would like to indicate the mean value for each category. Perhaps a short horizontal line at the mean y value for each category. How can I do this?

Harry Stuart
  • 1,781
  • 2
  • 24
  • 39

1 Answers1

1

You could use matplotlib.pyplot.hlines with some bookkeeping of the widths and locations for the lines. Here is an example using the seaborn tips dataset

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

tips = sns.load_dataset("tips")
sns.stripplot(x="day", y="total_bill", data=tips)
labels = [e.get_text() for e in plt.gca().get_xticklabels()]
ticks = plt.gca().get_xticks()
w = 0.1
for day, idx in enumerate(labels):
    idx = labels.index(day)
    plt.hlines(tips[tips['day'] == day]['total_bill'].mean(), ticks[idx]-w, ticks[idx]+w)
plt.show()

enter image description here


Some Explanation

labels = [e.get_text() for e in plt.gca().get_xticklabels()]

Extracts the text from the ticklabels which are automatically generated by sns.stripplot, this is more useful than tips['day'].unique() because the order of the labels does not necessarily match the order returned from tips['day'].unique. This is because if the order argument is not specified the order will be

[...] inferred from the the data objects.

plt.hlines(tips[tips['day'] == day]['total_bill'].mean(), ticks[idx]-w, ticks[idx]+w)

Draws horizontal lines of length w*2 around the center of the 'strip' and at a height of the mean of the column 'total_bill' where the 'day' value of that row is equal to the current day.

William Miller
  • 9,839
  • 3
  • 25
  • 46