0

Using this I'm pulling the plotted data from the Seaborn.Distplot. Surprisingly both histogram and KDE are returning different x-axis values. First, practical, question is how are these 2 x-axis combined in the same plot? Second, theoretical, question is why don't bins match - shouldn't both densities have been created with the same underlying bins?

Machavity
  • 30,841
  • 27
  • 92
  • 100
hihik
  • 3
  • 1
  • See [wikipedia](https://en.wikipedia.org/wiki/Kernel_density_estimation) about KDEs and note how binning is not involved. The KDE also involves smoothing, so if you would calculate a histogram based on a KDE, it almost always would be different from the histogram of the original data. – JohanC Jun 11 '20 at 16:12
  • See [How does distplot calculate the kde curve?](https://stackoverflow.com/questions/61228160/how-does-distplot-calculate-the-kde-curve/61230712#61230712) about how "manually" calculating a kde curve and how it can be combined with a histogram. It's just a sum of gaussians, one per sample. – JohanC Jun 11 '20 at 16:35
  • Thanks, @JohanC, clearly I have not yet grasped KDE very well. – hihik Jun 11 '20 at 18:18

1 Answers1

1

I'm not sure exactly what kind of answer you're looking for with the first question, but they are plotted independently, and nothing in matplotlib requires that two artists drawn on the same axes have identical x axis data.

To answer the second question, kernel density estimation doesn't use binning. Roughly speaking, it replaces each observation with a kernel, sums the kernels at each point in an evaluation grid, and normalizes. (Illustrated here). The histogram is also normalized to show a density, so you can plot one over top of the other and they will match. But there doesn't have to be any correspondence between the histogram bins and the evaluation grid for the KDE.

mwaskom
  • 46,693
  • 16
  • 125
  • 127
  • I'm star-struck, didn't expect YOU would answer my question. Do you mind explaining the mechanics of the 2 x-axis plotting - are both merged into a single series and then Ys are plotted to that? Sorry if I'm not explaining it right - I'm coming from Excel charting background where 2 x-axis is not possible. – hihik Jun 11 '20 at 18:01
  • Well, as explained in the [previously linked post](https://stackoverflow.com/questions/61228160/how-does-distplot-calculate-the-kde-curve/61230712#61230712), you just use matplotlib to draw a kde curve with an x-axis going from the smallest to the largest `x`, and a normalized histogram of the x-values. A distplot of a 2D distribution is quite more complex though. – JohanC Jun 11 '20 at 18:23
  • There's only one x *axis*, but you can draw multiple plots that have different x *data*, as long as they're in the same units. – mwaskom Jun 11 '20 at 19:17