Is it possible to extract the data from a sns.kdeplot()
before plotting?
ie. without using the function
y.get_lines()[0].get_data()
post plotting
Asked
Active
Viewed 7,253 times
4

Trenton McKinney
- 56,955
- 33
- 144
- 158

Michael Berry
- 923
- 3
- 13
- 17
-
1extract what data? you need some data to start with in order to generate the plot. please post some code so we have something to go with. – Constantino Sep 11 '15 at 13:26
-
`density_data = np.repeat(df.loc[:,"Position"].values.tolist(), df.loc[:,"BaseCount"].values)` `sns.kdeplot(density_data)` This obviously generates density data which is automatically plotted. Is it possible to extract this data – Michael Berry Sep 11 '15 at 14:05
-
2You should use the scipy or statsmodels KDE functions. – mwaskom Sep 11 '15 at 15:01
-
Both of these functions output an object. Still can't figure out how to extract actual density values. – Michael Berry Sep 14 '15 at 07:54
-
1It doesn't seem that you are able to do that. These functions are not designed for this. Now you got two ways of achieving your goal: mimic the calculation in statsmodels/scipy (look at seaborn's sources) or calculate it yourself (again: scipy, statsmodels or even scikit-learn) + plot it yourself without seaborn. – sascha Sep 30 '15 at 22:28
-
Thanks! Finally got it right with statsmodel using .density attribute – Michael Berry Oct 08 '15 at 09:54
2 Answers
4
This can be done by extracting the line data from the matplotlib Axes object:
import numpy as np
from seaborn import kdeplot
my_data = np.random.randn(1000)
my_kde = kdeplot(my_data)
line = my_kde.lines[0]
x, y = line.get_data()
fig, ax = plt.subplots()
ax.plot(x[x>0], y[x>0])
alternatively the statsmodels way:
import statsmodels.api as sm
dens = sm.nonparametric.KDEUnivariate(np.random.randn(1000))
dens.fit()
x =np.linspace(0,1,100) #restrict range to (0,1)
y = dens.evaluate(x)
plt.plot(x,y)

lewiso1
- 179
- 10
0
Based on statsmodels's documentation:
import numpy as np
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt
# generate bimodal disrtibution
X1 = np.random.normal(100, 10, 250)
X2 = np.random.normal(10, 20, 250)
X = np.concatenate([X1, X2])
# get density from seaborn
x, y = sns.kdeplot(X).lines[0].get_data()
# get density from statsmodel
kde = sm.nonparametric.KDEUnivariate(X).fit()
xx, yy = (kde.support, kde.density)
# compare outputs
plt.plot(x, y, label='from sns')
plt.plot(xx, yy, label='from statsmodels')
plt.legend()

Woldemar G
- 91
- 1
- 6