0

I would like to be able to plot a 4 x 5 (rows x columns) seaborn histogram plots using a for loop. For the code I have below, the plots come out individually and not together in 1 plot. This is what I have so far:

from sklearn.datasets import make_classification
import seaborn as sns
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

X_train,y_train = make_classification(n_samples=500, 
                          n_features=20, 
                          n_informative=9, 
                          n_redundant=0, 
                          n_repeated=0, 
                          n_classes=10, 
                          n_clusters_per_class=1,
                          class_sep=9,
                          flip_y=0.2,
                          #weights=[0.5,0.5], 
                          random_state=17)

sns.set_style('darkgrid')

coeff_to_analyze = np.arange(0,20,1)

rows = 4
cols = 5
N_BINS = 60

fig, axes = plt.subplots(rows, cols, figsize=(45,12))

for i in coeff_to_analyze:
    ax = plt.subplot(rows, cols, i+1)
    sns.displot(X_train[i, :], bins=60, kde=True)
    ax.set_title(f'Coefficient {i}')
    fig.tight_layout()
      
plt.savefig(f'Histogram_test.pdf', bbox_inches='tight')
plt.show()

Note that the links provided here seems to state that I have to convert my data to a pandas dataframe. Can I get the plots to show up without converting my data to a pandas dataframe?

I can get the plots to show up correctly if it is NOT a seaborn histogram. However, I am having trouble for the plots to show up if they are seaborn histogram plots

Joe
  • 357
  • 2
  • 10
  • 32
  • Do I have to convert my data to a `pandas` dataframe in order to get the plots to show up properly? I am currently getting empty plots in the rows x columns. The links your provided do not answer my question. – Joe Sep 02 '22 at 06:41
  • 1
    The first duplicate states very clearly that you must use `sns.histplot` with `axes`, not `sns.displot`. `sns.histplot` accepts `pandas.DataFrame`, `numpy.ndarray`, mapping, or sequence for `data`. The second duplicate shows how to correctly create and use subplots. Do not use `ax = plt.subplot(rows, cols, i+1)` with `fig, axes = plt.subplots(rows, cols, figsize=(45,12))`. That said, the duplicates do address the issues with your code. – Trenton McKinney Sep 02 '22 at 06:54
  • I tried this: `sns.set_style('darkgrid') i = np.arange(0,20,1) rows = 4 cols = 5 N_BINS = 60 fig, _ = plt.subplots(rows, cols, figsize=(45,12)) for i, ax in enumerate(fig.axes): sns.histplot(X_train[i, :], bins=60, kde=True) ax.set_title(f'Coefficient {i}') fig.tight_layout() plt.savefig(f'Histogram_test.pdf', bbox_inches='tight') plt.show()` but it only prints out all the plots in the last plot. – Joe Sep 02 '22 at 07:20
  • 1
    1. `fig, axes = plt.subplots(4, 5, figsize=(45, 12), tight_layout=True)` 2. `for ax, data in zip(axes.flat, X_train): sns.histplot(data=data, bins=60, kde=True, ax=ax)` – Trenton McKinney Sep 02 '22 at 07:27
  • 1
    `g = sns.displot(data=X_train, height=6, aspect=2, kde=True)` this seems informative. – Trenton McKinney Sep 02 '22 at 07:33
  • 1
    There is a mistake in my comment with `sns.histplot`. It should use `zip(axes.flat, X_train.T)`, note that `X_train.T` transposes the array so there will be 20 groups of 500 observations, and each group is zipped to one of the 20 axes. – Trenton McKinney Sep 02 '22 at 07:40
  • WOW! Can you give me more insights as to what `g = sns.displot(data=X_train, height=6, aspect=2, kde=True)` does? Looks cool! – Joe Sep 02 '22 at 07:42
  • 1
    It is plotting the 20 observations on top of each other, which is what made me realize the small mistake. – Trenton McKinney Sep 02 '22 at 07:44
  • 1
    Or 1. `df = pd.DataFrame(X_train)` 2. `dfm = df.melt()` 3. `g = sns.displot(data=dfm, x='value', col='variable', col_wrap=4, kde=True, height=5)` – Trenton McKinney Sep 02 '22 at 07:47
  • That should have said, "It's plotting the 20 groups on top ...". – Trenton McKinney Sep 02 '22 at 07:56
  • 1
    @TrentonMcKinney, thanks so much for your elegant code and insights. Finally, how to go about just getting the first 10 elements from your code `g = sns.displot(data=X_train, height=6, aspect=2, kde=True)`? I tried using `X_train[:,10]` but that did not work. Thanks again! – Joe Sep 02 '22 at 08:05
  • 1
    `X_train[:, :10]` to get the first 10 groups. Feel free to upvote https://stackoverflow.com/a/63895570/7758804 and https://stackoverflow.com/a/69228859/7758804, since most of the code comes from those two answers. I'm going to bed now, it's 01:13. – Trenton McKinney Sep 02 '22 at 08:13

0 Answers0