7

I have a distplot and I would like to plot a mean line that goes from 0 to the y value of the mean frequency. I want to do this, but have the line stop at when the distplot does. Why isn't there a simple parameter that does this? It would be very useful.

I have some code that gets me almost there:

plt.plot([x.mean(),x.mean()], [0, *what here?*])

This code plots a line just as I'd like except for my desired y-value. What would the correct math be to get the y max to stop at the frequency of the mean in the distplot? An example of one of my distplots is below using 0.6 as the y-max. It would be awesome if there was some math to make it stop at the y-value of the mean. I have tried dividing the mean by the count etc.

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
bismo
  • 1,257
  • 1
  • 16
  • 36
  • 2
    "Why isn't there a simple parameter that does this?" Because if libraries tried to build an API that handled every domain's and user's specific use case, the API would be incredibly bloated and hard to use and be an incredible burden to maintain. – Paul H Aug 07 '20 at 20:30
  • Could you explain the difference between the title *"between limits of y axis"* and the post *"a line that goes from 0 to the y value of the mean frequency"*? (By the way, the curve shows the [density](https://en.wikipedia.org/wiki/Probability_density_function), not the frequency.) – JohanC Aug 07 '20 at 22:47

2 Answers2

22

Update for the latest versions of matplotlib (3.3.4) and seaborn (0.11.1): the kdeplot with shade=True now doesn't create a line object anymore. To get the same outcome as before, setting shade=False will still create the line object. The curve can then be filled with ax.fill_between(). The code below is changed accordingly. (Use the revision history to see the older version.)

ax.lines[0] gets the curve of the kde, of which you can extract the x and y data. np.interp then can find the height of the curve for a given x-value:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
ax = sns.kdeplot(x, shade=False, color='crimson')
kdeline = ax.lines[0]
mean = x.mean()
xs = kdeline.get_xdata()
ys = kdeline.get_ydata()
height = np.interp(mean, xs, ys)
ax.vlines(mean, 0, height, color='crimson', ls=':')
ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
plt.show()

example plot

The same approach can be extended to show the mean together with the standard deviation, or the median and the quartiles:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
for ax in axes:
    sns.kdeplot(x, shade=False, color='crimson', ax=ax)
    kdeline = ax.lines[0]
    xs = kdeline.get_xdata()
    ys = kdeline.get_ydata()
    if ax == axes[0]:
        middle = x.mean()
        sdev = x.std()
        left = middle - sdev
        right = middle + sdev
        ax.set_title('Showing mean and sdev')
    else:
        left, middle, right = np.percentile(x, [25, 50, 75])
        ax.set_title('Showing median and quartiles')
    ax.vlines(middle, 0, np.interp(middle, xs, ys), color='crimson', ls=':')
    ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
    ax.fill_between(xs, 0, ys, where=(left <= xs) & (xs <= right), interpolate=True, facecolor='crimson', alpha=0.2)
    # ax.set_ylim(ymin=0)
plt.show()

sdev, median, quartiles

PS: for the mode of the kde:

    mode_idx = np.argmax(ys)
    ax.vlines(xs[mode_idx], 0, ys[mode_idx], color='lime', ls='--')
JohanC
  • 71,591
  • 8
  • 33
  • 66
  • 1
    Small deprecation warnings on `ylim` and had to remove parameter `ls` from `ax.vlines` but overall great answer – kevin_theinfinityfund Feb 19 '21 at 00:22
  • 1
    Thanks for the feedback. I couldn't reproduce your problems with `ylim` nor with `ls=` for `ax.vlines`. I now tested with seaborn 0.11.1 and matplotlib 3.3.4 and noticed a problem with `shade=True` not creating the line objects anymore. The updated code should now work for these verions. I also noticed `ax.set_ylim()` isn't needed anymore, so I left it out. – JohanC Feb 19 '21 at 01:11
  • 3
    It's very useful, this should be an option in seaborn. – Ghislain Viguier Aug 04 '21 at 13:19
  • 1
    Small update because of deprecation in seaborn v0.14.0, replace shade = False to fill = False, anyway good answer. – BootMaker Jun 28 '23 at 13:14
3

With plt.get_ylim() you can get the limits of the current plot: [bottom, top].
So, in your case, you can extract the actual limits and save them in ylim, then draw the line:

fig, ax = plt.subplots()

ylim = ax.get_ylim()
ax.plot([x.mean(),x.mean()], ax.get_ylim())
ax.set_ylim(ylim)

As ax.plot changes the ylims afterwards, you have to re-set them with ax.set_ylim as above.

Zephyr
  • 11,891
  • 53
  • 45
  • 80