4

Is it possible to extract the data from a sns.kdeplot() before plotting? ie. without using the function y.get_lines()[0].get_data() post plotting

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Michael Berry
  • 923
  • 3
  • 13
  • 17
  • 1
    extract what data? you need some data to start with in order to generate the plot. please post some code so we have something to go with. – Constantino Sep 11 '15 at 13:26
  • `density_data = np.repeat(df.loc[:,"Position"].values.tolist(), df.loc[:,"BaseCount"].values)` `sns.kdeplot(density_data)` This obviously generates density data which is automatically plotted. Is it possible to extract this data – Michael Berry Sep 11 '15 at 14:05
  • 2
    You should use the scipy or statsmodels KDE functions. – mwaskom Sep 11 '15 at 15:01
  • Both of these functions output an object. Still can't figure out how to extract actual density values. – Michael Berry Sep 14 '15 at 07:54
  • 1
    It doesn't seem that you are able to do that. These functions are not designed for this. Now you got two ways of achieving your goal: mimic the calculation in statsmodels/scipy (look at seaborn's sources) or calculate it yourself (again: scipy, statsmodels or even scikit-learn) + plot it yourself without seaborn. – sascha Sep 30 '15 at 22:28
  • Thanks! Finally got it right with statsmodel using .density attribute – Michael Berry Oct 08 '15 at 09:54

2 Answers2

4

This can be done by extracting the line data from the matplotlib Axes object:

import numpy as np
from seaborn import kdeplot

my_data = np.random.randn(1000)
my_kde = kdeplot(my_data)
line = my_kde.lines[0]
x, y = line.get_data()

fig, ax = plt.subplots()
ax.plot(x[x>0], y[x>0])

alternatively the statsmodels way:

import statsmodels.api as sm

dens = sm.nonparametric.KDEUnivariate(np.random.randn(1000))
dens.fit()
x =np.linspace(0,1,100) #restrict range to (0,1)
y = dens.evaluate(x)
plt.plot(x,y)
lewiso1
  • 179
  • 10
0

Based on statsmodels's documentation:

import numpy as np
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt

# generate bimodal disrtibution
X1 = np.random.normal(100, 10, 250)
X2 = np.random.normal(10, 20, 250)
X = np.concatenate([X1, X2])

# get density from seaborn
x, y = sns.kdeplot(X).lines[0].get_data()

# get density from statsmodel
kde = sm.nonparametric.KDEUnivariate(X).fit()
xx, yy = (kde.support, kde.density)

# compare outputs
plt.plot(x, y, label='from sns')
plt.plot(xx, yy, label='from statsmodels')
plt.legend()

enter image description here

Woldemar G
  • 91
  • 1
  • 6