2

Seaborn distplot has a flag norm_hist. When switched on, the distplot is normalized so that it integrates to 1.

Here is a small example:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

a = np.random.randn(10000)
a = a[(-2<a) * (a<2)]

sns.distplot(a=a, bins=np.linspace(-2, 2, 6), norm_hist=True, kde=False)
plt.show()

This creates the following plot:

distplot

It looks a bit like a triangle and you can see that it integrates to 1. A triangle of width 4 and height 0.5 has the area 1.

Is it possible to normalize this while ignoring the width of each bin? So that the bars add up to 1 (it would have this behaviour if all bins together spanned a width of 1).

In my scenario, users are binned based on a continuous feature. In the distplot, I would like to show what percentage of users falls into each bin. But when the feature is rescaled, the heights of the corresponding bars change.

lhk
  • 27,458
  • 30
  • 122
  • 201
  • 1
    See https://stackoverflow.com/questions/3866520/plotting-histograms-whose-bar-heights-sum-to-1-in-matplotlib – ImportanceOfBeingErnest Oct 03 '19 at 17:17
  • thanks, I didn't see that. But the accepted answer only reproduces the problem. And the answer that should be accepted describes a solution for pyplot. It is based on a weights parameter, which is not exposed by seaborn. – lhk Oct 03 '19 at 17:24
  • Seaborn has a *hist_kw* parameter. But it's not clear why you would need to use seaborn anyways. – ImportanceOfBeingErnest Oct 03 '19 at 17:28
  • I suspect a XY problem here. What is the statistical function you want to plot ? CDF ? PDF ? Why do you want it to plot in y in [0;1] ? Makes me think of https://stackoverflow.com/questions/55128462/how-to-normalize-seaborn-distplot/55131267#55131267 – LoneWanderer Oct 05 '19 at 21:15

0 Answers0