8

The pandas.plot.kde() function is handy for plotting the estimated density function of a continuous random variable. It will take data x as input, and display the probabilities p(x) of the binned input as its output.

How can I extract the values of probabilities it computes? Instead of just plotting the probabilities of bandwidthed samples, I would like an array or pandas series that contains the probability values it internally computed.

If this can't be done with pandas kde, let me know of any equivalent in scipy or other

develarist
  • 1,224
  • 1
  • 13
  • 34

1 Answers1

13

there are several ways to do that. You can either compute it yourself or get it from the plot.

  1. As pointed out in the comment by @RichieV following this post, you can extract the data from the plot using
data.plot.kde().get_lines()[0].get_xydata()
  1. Use seaborn and then the same as in 1):

You can use seaborn to estimate the kernel density and then matplotlib to extract the values (as in this post). You can either use distplot or kdeplot:

import seaborn as sns

# kde plot
x,y = sns.kdeplot(data).get_lines()[0].get_data()
# distplot
x,y = sns.distplot(data, hist=False).get_lines()[0].get_data()

  1. You can use the underlying methods of scipy.stats.gaussian_kde to estimate the kernel density which is used by pandas:
import scipy.stats

density = scipy.stats.gaussian_kde(data)

and then you can use this to evaluate it on a set of points:

x = np.linspace(0,80,200)
y = density(xs)
My Work
  • 2,143
  • 2
  • 19
  • 47
  • for the third method, what if the data is known to be non-gaussian? – develarist Aug 05 '20 at 06:52
  • That's a problematic issue, `scipy` nor anything which is built on top of it, like `pandas` can handle anything non-gaussian. If you need that, I recommend using `statsmodels`. I also recommend this post about other kernels: https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/#Kernels or consult `statsmodels`: https://scikit-learn.org/stable/modules/density.html#kernel-density-estimation. – My Work Aug 05 '20 at 06:56
  • `pandas.plot.kde()` will graphically display the estimated density of anything you send it though, whether it be non-normal or non-unimodal – develarist Aug 05 '20 at 06:57
  • The [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde) docs says: `The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.` About the non-normal. Yes, it will always return something, the question is what. I recommend the previous links. – My Work Aug 05 '20 at 08:30