0

I am working on jupyter notebook on presenting some plots. I have looked everywhere for an answer with no luck. I have the following dataset (I'm providing a sample but the original dataset is larger [64 columns and 32 rows]):

label=['hc','svppa','nfvppa','lvppa']

df ={"id":list(range(1,21,1)), "label": list(np.repeat(label, 5)), "col1":list(np.random.normal(100,10,size=20)), "col2":list(np.random.normal(100,10,size=20)), "col3":list(np.random.normal(100,10,size=20)), 
                        "col4":list(np.random.normal(100,10,size=20)), "col5":list(np.random.normal(100,10,size=20)), "col6":list(np.random.normal(100,10,size=20)), "col7":list(np.random.normal(100,10,size=20))}
df = pd.DataFrame(test_df)

So it looks like this:

Data Frame.head()

Now what I want to do is to plot the probability plots to test for normality using:

columns = list(master_df.columns[2:])
for col in columns:
    for label in labels:
        stats.probplot(df[df['label']==label][col], dist='norm', plot=plt)
        plt.title("Probability plot " + col + " - " + label)
        plt.show()

Which creates the plots that I want but they are not 'pretty for presentation'. I wanted to use the subplotting function in matplotlib, but it does not produce the results desired. Given that I am using stats.probplot I can't figure out a way to properly use subplots.

I have tried the following (and different iterations) with no luck:

fig, axes = plt.subplots(nrows=len(columns),4 , figsize= (15,15), sharex=True, sharey=True )
plt.subplots_adjust(hspace=0.5)
axes=axes.ravel()

for n, label in enumerate(label):
    for col in columns:
        b = stats.probplot(df[df['label']==label][col], dist='norm', plot=plt(axes[n]))

Any ideas will be much appreciated!

EduardoRod
  • 105
  • 1
  • 8

1 Answers1

0

Suggestion based on https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html and method 2 at https://engineeringfordatascience.com/posts/matplotlib_subplots/:

plt.figure(figsize=(45, 12))
plt.subplots_adjust(hspace=0.5)

for n, label in enumerate(labels):
    for m,col in enumerate(columns):
        ax = plt.subplot(len(labels), len(columns), ((len(columns)-1)*n)+(m+(n+1)))
        b = stats.probplot(df[df['label']==label][col], dist='norm', plot=ax)
        plt.title("Probability plot " + col + " - " + label)
        plt.subplots_adjust(top=1.15) # based on https://www.statology.org/matplotlib-subplot-spacing/ to fix title of subplot overlapping with x-axes label
plt.savefig('test_save.png', bbox_inches="tight") # based on https://stackoverflow.com/a/47956856/8508004

You can see it in action at the bottom of the page here.

Keep in mind because it is a large won't look good in the preview output in the notebook and you'll need to open the full-resolution image generated.

Wayne
  • 6,607
  • 8
  • 36
  • 93