3

I'm trying to produce two seaborn kernel density plots (kdeplot) side by side.

Three features (Community School?, Economic Need Index, School Income Estimate) are used here. The only categorical feature 'Community School?' is shown as green-blue colors representing its levels. 'Economic Need Index' and 'School Income Estimate' are for the two kdeplots respectively.

The image created using the code shown below is the best result I could get, but it has problems.

1) y-axis scale of the second plot is wrong (it should be some integer scales like the first plot) correction: kdeplot is normed (everything sums up to 1), so y-axis is correct given its x values.

2) an extra axis(?) is produced along below the two plots

3) I want to add a title for each subplot

I found kdeplot doesn't support hue so I tried to make it work with FacetGrid. Not sure if it's the right way to do it. Would appreciate if a better method is provided.

fig, (ax1, ax2) = plt.subplots(1, 2)
fig.subplots_adjust(wspace=.8)

fg = sns.FacetGrid(df, hue='Community School?', size=3)
fg.map(sns.kdeplot, 'Economic Need Index', shade=True, ax=ax1, label='Economic Need Index')
fg.map(sns.kdeplot, 'School Income Estimate', shade=True, ax=ax2, label='School Income Estimate')
plt.show()

enter image description here

# my dataset looks like:

Community School? / Economic Need Index / School Income Estimate
0   Yes 0.919   31141.72
1   No  0.641   56462.88
2   No  0.744   44342.61
3   No  0.860   31454.00
4   No  0.730   46435.59
5   No  0.858   39415.45
6   No  0.499   43706.73
7   No  0.833   28820.67
8   No  0.849   34889.24
9   No  0.861   35545.10
10  No  0.559   40809.90
11  Yes 0.917   27881.59
12  Yes 0.832   NaN
13  No  0.791   NaN
14  No  0.362   63760.00
15  No  0.771   NaN
16  No  0.451   62519.57
17  No  0.430   57504.48
18  No  0.448   56787.20
19  No  0.764   NaN
20  No  0.610   NaN
21  No  0.257   76833.96
22  No  0.597   NaN
23  No  0.769   32817.79
24  No  0.858   26114.78
25  No  0.176   103399.19
26  No  0.101   144270.13
27  No  0.293   98455.77
28  No  0.430   88011.14
29  No  0.153   102421.46
... ... ... ...

And a full dataset can be found here.

Zoe L
  • 1,150
  • 14
  • 22

2 Answers2

7

Consider melting your dataframe to have one value column and one indicator columns for Economic Need Indicator and School Income Estimate. Then, plot without matplotlib's subplots() call, only seaborn's FacetGrid with adjustments to default plot attributes:

long_df = pd.melt(df, id_vars='Community School?', var_name='Indicator', value_name='value')
print(long_df.head())
#       Community School?            Indicator  value
# 0                   Yes  Economic Need Index  0.919
# 1                    No  Economic Need Index  0.641
# 2                    No  Economic Need Index  0.744
# 3                    No  Economic Need Index  0.860
# 4                    No  Economic Need Index  0.730

fg = sns.FacetGrid(long_df, col='Indicator', hue='Community School?', 
                       sharex=False, sharey=False, size=4)
fg.map(sns.kdeplot, 'value', shade=True, label='Data')\
         .add_legend()\
         .set_titles("{col_name}")\
         .set_axis_labels('')
plt.show()

plt.clf()
plt.close('all')

Facet Grid Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
3

You are getting the additional figure because FacetGrid automatically opens its own window when called. See the answer to this question for further details. Here therefore is a simpler approach that works. I have added two optional lines to replace the NaNs with the mean for each type of school.

s = df.groupby(['Community School?'])['School Income Estimate'].transform('mean')
df['School Income Estimate'].fillna(s, inplace=True)

plt.subplots(1, 2)
plt.subplot(1, 2, 1)
a = sns.kdeplot(df.loc[df['Community School?'] == 'No', 'Economic Need Index'], shade=True, label='No')
b = sns.kdeplot(df.loc[df['Community School?'] == 'Yes', 'Economic Need Index'], color= 'red', shade=True, label='Yes')
plt.title('KDE of Economic Need Index')
plt.subplot(1, 2, 2)
c = sns.kdeplot(df.loc[df['Community School?'] == 'No', 'School Income Estimate'], shade=True, label='No')
d = sns.kdeplot(df.loc[df['Community School?'] == 'Yes', 'School Income Estimate'], color= 'red', shade=True, label='Yes')
plt.title('KDE of School Income Estimate')

enter image description here

KRKirov
  • 3,854
  • 2
  • 16
  • 20