4

I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do

x = df['A']
sns.distplot(x);

I am trying to use the FacetGrid & Map to have all plots for each columns at once in this way. But doesn't work at all.

  g = sns.FacetGrid(df, col = 'A','B','C','D','E')
  g.map(sns.distplot())
Annalix
  • 470
  • 2
  • 6
  • 17

4 Answers4

7

I think you need to use melt to reshape your dataframe to long format, see this MVCE:

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))

Output: enter image description here


From seaborn 0.11.2 it is not recommended to use FacetGrid directly. Instead, use sns.displot for figure-level plots.

np.random.seed(2022)
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')

g = sns.displot(data=dfm, x='value', col='columns', col_wrap=3, common_norm=False, kde=True, stat='density')

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
4

You're getting this wrong on two levels.

  • Python syntax.
    FacetGrid(df, col = 'A','B','C','D','E') is invalid, because col gets set to A and the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.

  • Seaborn concepts.

    • Seaborn expects a single column name as input for the col or row argument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.

    • You do not call the function to be used by map. The idea is of course that map itself calls it.

Solutions:

  • Loop over columns:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    fig, axes = plt.subplots(ncols=5)
    for ax, col in zip(axes, df.columns):
        sns.distplot(df[col], ax=ax)
    
    plt.show()
    
  • Melt dataframe

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    g = sns.FacetGrid(df.melt(), col="variable")
    g.map(sns.distplot, "value")
    
    plt.show()
    
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
3

You can use the following:

# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
E.Zolduoarrati
  • 1,539
  • 2
  • 9
  • 9
1

I think the easiest approach is to just loop the columns and create a plot.

import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
    hist = df[col].hist(bins=10)
    print("Plotting for column {}".format(col))
    plt.show()
nishant
  • 896
  • 1
  • 8
  • 27