6

When I draw displot for discrete variables, the distribution might not be as what I think. For example.

enter image description here We can find that there are crevices in the barplot so that the curve in kdeplot is "lower" in y axis.

In my work, it was even worse: enter image description here

I think it may because the "width" or "weight" was not 1 for each bar. But I didn't find any parameter that can justify it.

I'd like to draw such curve (It should be more smooth) enter image description here

Zealseeker
  • 823
  • 1
  • 7
  • 23
  • I would say both plots look reasonable. What exactly do you want to change? – ImportanceOfBeingErnest Feb 26 '18 at 14:20
  • @ImportanceOfBeingErnest Thanks. I have updated the question and you can see what I want. – Zealseeker Feb 26 '18 at 14:28
  • I see. So a kernel density estimate as produced by distplot is probably not what you're after. Maybe rather binning the data manually and plotting the green line would be what you want. You can then also smoothen the line with a gaussian filter or so. – ImportanceOfBeingErnest Feb 26 '18 at 14:32
  • @ImportanceOfBeingErnest Yes. Previously I have fitted them manually using sigmoid-like curve (which is better than normal distribution built in seaborn). However, it's not my intention of the post. – Zealseeker Feb 26 '18 at 14:55

3 Answers3

12

One way to deal with this problem might be to adjust the "bandwidth" of the KDE (see the documentation for seaborn.kdeplot())

n = np.round(np.random.normal(5,2,size=(10000,)))
sns.distplot(n, kde_kws={'bw':1})

enter image description here

EDIT Here is an alternative with a different scale for the bars and the KDE

n = np.round(np.random.normal(5,2,size=(10000,)))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

sns.distplot(n, kde=False, ax=ax1)
sns.distplot(n, hist=False, ax=ax2, kde_kws={'bw':1})

enter image description here

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • The picture is not that "beautiful". I wonder if I can make the discrete variables have a density of 1 so that their area will be larger and the "kde curve" will be as high as the bars. – Zealseeker Feb 26 '18 at 14:52
  • it depends what you are trying to achieve. If it is just for display purposes, then you can do whatever scaling you want. I edited my answer to propose a solution to scale the KDE to the height of the bars – Diziet Asahi Feb 26 '18 at 15:12
  • @Zealseeker Maybe you would like to specify the `bins` to equal the unique number of values of the variable you are plotting (it is discrete at the end). That way I got wider and nicer histogram bars. – Rafs Aug 20 '19 at 15:48
5

If the problem is that there are some emptry bins in the histogram, it probably makes sense to specify the bins to match the data. In this case, use bins=np.arange(0,16) to get the bins for all integers in the data.

import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
import seaborn as sns

n = np.random.randint(0,15,10000)
sns.distplot(n, bins=np.arange(0,16), hist_kws=dict(ec="k"))

plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
0

It seems sns.distplot (or displot https://seaborn.pydata.org/generated/seaborn.displot.html) is for plotting histograms and no barplots. Both Histogram and KDE (which is an approximation of the probability density function) make sense only with continuous random variables. So in your case, as you'd like to plot a distribution of a discrete random variable, you must go for a bar plot and plotting the Probability Mass Function (PMF) instead.

import numpy as np
import matplotlib.pyplot as plt

array = np.random.randint(15, size=10000)
unique, counts = np.unique(array, return_counts=True)
freq =counts/10000 # to change into frequency, no count

# plotting the points 
plt.bar(unique, freq)

# naming the x axis
plt.xlabel('Value')
# naming the y axis
plt.ylabel('Frequency')

#Title
plt.title("Discrete uniform distribution")

# function to show the plot
plt.show()