0

I'd like to plot a histogram graph in Jupyter notebook for a dataset where its minimum value is ~8e+01, max value 6e+07, mean ~8e+05, median ~1e+05, and mode ~2e+03. I'd also like to show the central tendency lines in the graph. This is my code:

plt.figure(figsize=(7,5))
sns.kdeplot(dataset, shade=True)
plt.vlines(x=np.mean(dataset), ymin=0, ymax=1, color='blue', linestyle='--')
plt.vlines(x=np.median(dataset), ymin=0, ymax=1, color='brown', linestyle='--')
plt.vlines(x=scp.mode(dataset)[0][0], ymin=0, ymax=1, color='red', linestyle='--')
plt.legend(['Dataset', 'Mean', 'Median', 'Mode'], fontsize=12)
plt.show()

However the resulting graph is:

enter image description here

I've searched online, but all the answers I found was either to add plt.show(), which I have done, and %matplotlib inline, which only applies to IPython notebook. I can't think of another reason why the graph isn't showing the data properly.

test dataset:

596819.4
642682.4
619509.33
649141.25
545800.05
572019.8
718330.48
830967.23
742220.67
656892.03
1005603.78
653560.14
657444.04
696281.81
709268.62
924993.61
823623.36
734721.29
783935.36
882754.72
821523.45
800603.23
772926.12
800194.34
850931.58
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
AngryWeeb
  • 65
  • 1
  • 9
  • See also [how to show the median line on a distplot between 0 and the y value?](https://stackoverflow.com/questions/63307440/how-to-plot-a-mean-line-on-a-distplot-between-0-and-the-y-value-of-the-mean) – JohanC Sep 18 '21 at 18:21

1 Answers1

1
  • It's because ymax=1 is specified; the max of the y-axis for the kdeplot is 3e-06. The density curve is there, it just can't be seen because of the scale of the vlines.
    • .vlines requires ymin and ymax to be a value on the axis.
  • Get the max value of the y-ticks, and use that as ymax
  • Tested in python 3.8.11, matplotlib 3.4.3, seaborn 0.11.2
import seaborn as sns
import numpy as np
from scipy.stats import mode
import matplotlib.pyplot as plt

ds = [596819.4, 642682.4, 619509.33, 649141.25, 545800.05, 572019.8, 718330.48, 830967.23, 742220.67, 656892.03, 1005603.78, 653560.14, 657444.04, 696281.81, 709268.62, 924993.61, 823623.36, 734721.29, 783935.36, 882754.72, 821523.45, 800603.23, 772926.12, 800194.34, 850931.58]

p = sns.kdeplot(ds, shade=True, label='Dataset')
ticks = max(p.get_yticks())  # get the max value of the y-ticks
p.vlines(x=np.mean(ds) , ymin=0, ymax=ticks, color='blue', linestyle='--', label='Mean')
p.vlines(x=np.median(ds), ymin=0, ymax=ticks, color='brown', linestyle='--', label='Median')
p.vlines(x=mode(ds)[0][0], ymin=0, ymax=ticks, color='red', linestyle='--', label='Mode')
p.legend(fontsize=12)
plt.show()

enter image description here

  • As noted by @JohanC in a comment, using axvline sets ymin and ymax as a percent of the y-axis, so extracting max(p.get_yticks()) isn't necessary.
p = sns.kdeplot(ds, shade=True, label='Dataset')
p.axvline(x=np.mean(ds) , ymin=0, ymax=1, color='blue', linestyle='--', label='Mean')
p.axvline(x=np.median(ds), ymin=0, ymax=1, color='brown', linestyle='--', label='Median')
p.axvline(x=mode(ds)[0][0], ymin=0, ymax=1, color='red', linestyle='--', label='Mode')
p.legend(fontsize=12)
plt.show()
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158