0

suppose i have a data

_sample=np.array([1,2,3,4,5,6,7,8,9,10])

i am plotting the data using seaborn distplot which plots the data using KDE distribution

enter image description here

left image, i use the bin value as 10

  1. I am getting a plot which has a value 0.11 but it should be exactly 0.1 as value/n = 0.1

right image, i use the bin value [1,2,3,4,5,6,7,8,9,10]

  1. in the right image, i get most(90%) of the value at 0.10 but i have a few value having the y-axis 0.20. why is the right side of plot reaching to 0.20 when it all should have the value 0.10

please let me know what i am missing, i am not able to understand this

update: adding code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

_fig,_ax=plt.subplots(1,2,figsize=(15,5))
_sample=np.array([1,2,3,4,5,6,7,8,9,10])
sns.distplot(_sample,bins=10,ax=_ax[0],axlabel='bins=10')
sns.distplot(_sample,bins=[1,2,3,4,5,6,7,8,9,10],ax=_ax[1],axlabel='bins=[1,2,3,4,5,6,7,8,9,10]')
Lijin Durairaj
  • 4,910
  • 15
  • 52
  • 85
  • 1
    Note that histograms are mainly meant for continuous distributions. If you have discrete data you need to position the bin boundaries very carefully. For kdeplots it is moreover supposed that the distribution is smooth, slowly falling off at the edges. Also note that due to floating point accuracy, having sample values that directly fall on a bin boundary, can fall in either bin. Further, note that setting `bins=[1,2,3,4,5,6,7,8,9,10]` sets 10 boundaries, which only delimit 9 bins. More correct bins could be `bins=np.arange(0.5, 11)`. Please note that a reproducible example would be helpful. – JohanC Sep 08 '20 at 11:44
  • See also [Matplotlib histogram not counting correctly the number of values in each bin](https://stackoverflow.com/a/63679392/12046409) – JohanC Sep 08 '20 at 11:45

0 Answers0