24

I use

sns.distplot 

to plot a univariate distribution of observations. Still, I need not only the chart, but also the data points. How do I get the data points from matplotlib Axes (returned by distplot)?

tesgoe
  • 1,012
  • 3
  • 10
  • 19

4 Answers4

31

You can use the matplotlib.patches API. For instance, to get the first line:

sns.distplot(x).get_lines()[0].get_data()

This returns two numpy arrays containing the x and y values for the line.

For the bars, information is stored in:

sns.distplot(x).patches

You can access the bar's height via the function patches.get_height():

[h.get_height() for h in sns.distplot(x).patches]
ekhumoro
  • 115,249
  • 20
  • 229
  • 336
Nils Gudat
  • 13,222
  • 3
  • 39
  • 60
  • 8
    This is not strictly reliable. If there were any lines on the `Axes` before the call to `distplot` you will get the data from that line. – tacaswell May 22 '16 at 19:23
  • 4
    Another tip: to get the bin left edges, widths as well as the heights do: `l = [[h.xy[0], h.get_width(), h.get_height()] for h in sns.distplot(x).patches]` – KamKam Feb 27 '20 at 11:26
  • 2
    I have just tested the solution and it does not work for me as 'get_lines()' is not a valid method for a FacetGrid Object. I have succeeded using this answer : https://stackoverflow.com/questions/46248348/seaborn-matplotlib-how-to-access-line-values-in-facetgrid – eidal Feb 10 '21 at 13:41
7

If you want to obtain the kde values of an histogram you can use scikit-learn KernelDensity function instead:

import numpy as np
import pandas as pd
from sklearn.neighbors import KernelDensity

ds=pd.read_csv('data-to-plot.csv')
X=ds.loc[:,'Money-Spent'].values[:, np.newaxis]


kde = KernelDensity(kernel='gaussian', bandwidth=0.75).fit(X) #you can supply a bandwidth
                                                              #parameter. 

x=np.linspace(0,5,100)[:, np.newaxis]

log_density_values=kde.score_samples(x)
density=np.exp(log_density_values)

array([1.88878660e-05, 2.04872903e-05, 2.21864649e-05, 2.39885206e-05,
       2.58965064e-05, 2.79134003e-05, 3.00421245e-05, 3.22855645e-05,
       3.46465903e-05, 3.71280791e-05, 3.97329392e-05, 4.24641320e-05,
       4.53246933e-05, 4.83177514e-05, 5.14465430e-05, 5.47144252e-05,
       5.81248850e-05, 6.16815472e-05, 6.53881807e-05, 6.92487062e-05,
       7.32672057e-05, 7.74479375e-05, 8.17953578e-05, 8.63141507e-05,
       ..........................
       ..........................
       3.93779919e-03, 4.15788216e-03, 4.38513011e-03, 4.61925890e-03,
       4.85992626e-03, 5.10672757e-03, 5.35919187e-03, 5.61677855e-03])
2

With the newer version of seaborn this is not the case anymore. First of all, distplot has been replaced with displot. Secondly, when calling get_lines() an error message comes up AttributeError: 'FacetGrid' object has no attribute 'get_lines'.

themellion
  • 31
  • 3
  • `displot` now returns a `FacetGrid`, you have to access individual axes of the `FacetGrid` to access their `lines` or `patches` attributes. See https://stackoverflow.com/a/75721847/13636407 – paime Mar 13 '23 at 12:27
0

This will get the kde curve you want

line = sns.distplot(data).get_lines()[0]
plt.plot(line.get_xdata(), line.get_ydata())

WS100002
  • 1
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 06 '22 at 14:53