0

I am trying to plot either 4 graphs (subplots) of KDE or 1 with 4 lines. I have two columns:

Region:      Charges:

southeast    6000
southeast    5422
southwest    3222
northwest    4222
northwest    5555
northeast    6729
etc 1000s of rows..4 regions

I'd like to visualize the distribution of these 4 areas.

Playing around with this and error messages (and I know it's not correct) 'Data must be 1-dimensional'.

fig, axes = plt.subplots(2, 2, sharex=True, figsize=(10,5))
fig.suptitle('Bigger 1 row x 2 columns axes with no data')
#axes[0].set_title('Title of the first chart')
reg_name = df2[['region','charges']].set_index('region')
southeast = reg_name.loc['southeast']
southwest = reg_name.loc['southwest']
northwest = reg_name.loc['northwest']

#c = df2.charges.values
#d = df2.region
# Set the dimensions of the plot
#widthInInches = 10
#heightInInches = 4
#plt.figure( figsize=(widthInInches, heightInInches) )
# Draw histograms and KDEs on the diagonal usin
#if( int(versionStrParts[1]) < 11 ):
# Use the older, now-deprectaed form
#   ax = sns.distplot(c,
#      kde_kws={"label": "Kernel Density", "color" : "black"},
#      hist_kws={"label": "Histogram", "color" : 'lightsteelblue'})
#else:
# Use the more recent for

sns.kdeplot(ax=axes[0], x=southeast.index, y=southeast.values, color="black", label="Kernel Density")
axes[0].set_title(southeast.name)

sns.kdeplot(ax=axes[1], x=southwest.index, y=southwest.values, color="black", label="Kernel Density")
axes[1].set_title(southwest.name)
JohanC
  • 71,591
  • 8
  • 33
  • 66
Sidster Jinz
  • 1
  • 1
  • 2

1 Answers1

3

sns.kdeplot(ax=axes[0,0], data=df2[df2['region']=='southeast'], x='charges', color='k') should work for your data. Note that axes is a 2D array when both the number of rows and columns are larger than 1.

See How to plot a mean line on a distplot between 0 and the y value of the mean? for adding lines for mean, sdev etc..

Instead of doing the kdeplots one by one, sns.displot can draw them in one go (note that displot is different from distplot):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

np.random.seed(12358)
regions = ['southeast', 'southwest', 'northeast', 'northwest']
df2 = pd.DataFrame({'region': np.repeat(regions, 100),
                    'charge': np.round(np.random.randn(400).cumsum() * 100 + 2000)})

g = sns.displot( kind='kde', data=df2, x='charge',
                 col='region', col_order=regions, col_wrap=2,
                 height=4, aspect=3, color='black')
for region,ax in g.axes_dict.items():
    data = df2[df2['region'] == region]['charge'].values
    xs, ys = ax.get_lines()[0].get_data()
    median = np.median(data)
    mean = data.mean()
    sdev = data.std()
    ax.vlines([mean-sdev, mean, mean+sdev], 0, np.interp([mean-sdev, mean, mean+sdev], xs, ys), color='b', ls=':')
    ax.vlines(median, 0, np.interp(median, xs, ys), color='r', ls='--')
plt.tight_layout()
plt.show()

sns.displot with kind='kde'

To draw all the regions into one plot, you can use:

fig, ax = plt.subplots(figsize=(12, 4))
sns.kdeplot(data=df2, x='charge', hue='region', ax=ax)

sns.kdeplot with hue per region

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Drawing charts in Python is still a major pain. I've done it numerous times and I always have to copy it from somewhere. I thank you for such a detailed response! – Original BBQ Sauce Feb 25 '22 at 13:45