1

I have a dataframe X with 30 variables, v1, v2 ... v30 and col_name=[v1,v2.....v30]

For each variable, I want to plot the histogram to understand the variable distribution. However, it is too manual to write code to plot one by one, can I have something like a for loop to draw 30 histograms one under another at one go?

For example:

for i in range(30):
  hist(np.array(X[col_name[i]]).astype(np.float),bins=100,color='blue',label=col_name[i],normed=1,alpha=0.5)

How can I do that? Like one page of graphs (each with title and label) so that I can scroll down to read.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Hyoceansun
  • 135
  • 1
  • 2
  • 10
  • 1
    Yes you can use for loops to perform iterative tasks. – wwii Nov 24 '17 at 05:54
  • How to write the code? Not sure how to add iterative variable i to the subplot function.. Thanks! – Hyoceansun Nov 24 '17 at 05:56
  • I usually start with looking through the [gallery/examples](https://matplotlib.org/gallery/index.html) to find something with the feature I want and then look to see how they did it. Sometimes I need to spend time with one or more of the [tutorials](https://matplotlib.org/gallery/index.html), and often need to carefully read [the documentation](https://matplotlib.org/api/index.html) for the functions/methods I am trying to use. – wwii Nov 24 '17 at 05:59

1 Answers1

6

You could do something like this:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.normal(0, 10)

df = pd.DataFrame({
        'v1': np.random.normal(0, 3, 20),
        'v2': np.random.normal(0, 3, 20),
        'v3': np.random.normal(0, 3, 20),
        'v4': np.random.normal(0, 3, 20),
        'v5': np.random.normal(0, 3, 20),
        'v6': np.random.normal(0, 3, 20),        
    })


# Generically define how many plots along and across
ncols = 3
nrows = int(np.ceil(len(df.columns) / (1.0*ncols)))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(10, 10))

# Lazy counter so we can remove unwated axes
counter = 0
for i in range(nrows):
    for j in range(ncols):

        ax = axes[i][j]

        # Plot when we have data
        if counter < len(df.columns):

            ax.hist(df[df.columns[counter]], bins=10, color='blue', alpha=0.5, label='{}'.format(df.columns[counter]))
            ax.set_xlabel('x')
            ax.set_ylabel('PDF')
            ax.set_ylim([0, 5])
            leg = ax.legend(loc='upper left')
            leg.draw_frame(False)

        # Remove axis when we no longer have data
        else:
            ax.set_axis_off()

        counter += 1

plt.show()

Results in:

enter image description here

Adapted from: How do I get multiple subplots in matplotlib?

jonnybazookatone
  • 2,188
  • 15
  • 21