How to plot multiple subplots using for loop?

Question

I am very new to Python. I have a dummy dataset (25 X 6) for practice. Out of 6 columns, I have 1 target variable (binary) and 5 independent variables (4 categorical and 1 numeric). I am trying to view my target distribution by the values within each of the 4 categorical columns (and without writing code for separate columns - but with a for loop usage so that I can scale it up for bigger datasets in the future). Something like below:

I am already successful in doing that (image above), but since I could only think of achieving this by using counters inside a for loop, I don't think this is Python elegant, and pretty sure there could be a better way of doing it (something like CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i,columns = 'ReversedPayment',values=0).axes.plot(kind='bar', stacked=True). I am struggling in handling this ax = setting) Below is my non-elegant Python code (not scalable):

counter = 1
p = 0 
q = 0
fig,axes = plt.subplots(2,2,figsize=(15,10))
for i in categoricals[:-1]:
    CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i,columns = 'ReversedPayment',values=0).plot(kind='bar', stacked=True,ax = axes[p][q])
    counter = counter+1
    q = q+1
    if counter==3:
        q=0
        p = p+1

Here's the full data generation code:

d = {
    'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] , 
    'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] , 
    'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] , 
    'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] , 
    'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] , 
    'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)


categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
        numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')

My other minor problem is getting data labels. Any comments, advice much appreciated. Thank you.

Please [do not post images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) of your data, a block of [code-formatted text with the output of `print(df)` is much more useful](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Cimbali, Jun 14 '21 at 17:38
Thanks for the suggestion - I just added the data generation code for full data — Scott Grammilo, Jun 14 '21 at 17:57

score 1 · Accepted Answer · edited Jun 30 '21 at 14:49

The best way to make your code less repetitive for many potential columns is to make a function that plots on an axis. That way you can simply adjust with 3 parameters basically:

ncols = 2
col_show = 'ReversedPayment'
col_subplots = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob']

Now we can compute the rest from there. Note that zip allows to iterate directly on several arrays at the same time, and np.flat iterates on the 2D axes array as if it were 1D.

nrows=(len(col_subplots) + ncols - 1) // ncols
fix, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(7.5 * ncols, 5 * nrows), sharey=True)
axes_it = axes.flat

for col, ax in zip(col_subplots, axes_it):
    plot_data(CarWash, col_show, col, ax)

# If number of columns not multiple of ncols, hide remaining axes
for ax in axes_it:
    ax.axis('off')

plt.show()

Now in this case the plot_data is very simple it barely needs to be a function. But you can complexify it easily this way, and it allows to keep the data logic somewhat separate from the rest which is basically housekeeping.

DataFrame.value_counts() does the same as GroupBy.size() but it’s slightly more explicit
unstack() pivots an index level to columns − you did this with .reset_index().pivot(). So now you have your column a (here always ReversedPayment) as columns, the other column as index
Finally .plot.bar() is the same as .plot(kind='bar'), ax specifies which axes to plot on, rot=0 avoids rotating the indexes and you already know stacked=True.

def plot_data(df, a, b, ax):
    counts = df[[a, b]].value_counts().unstack(a)
    counts.plot.bar(ax=ax, stacked=True, rot=0)

As you can see subplots(sharey=True) allows all plots to have the same scaling on the y axis and thus makes comparing the various plots easier.

The other advantage of using an iterator axes_it is that it continues where you stopped iterating on it − suppose you had only 3 col_subplots, there’s 1 left, and now you can call ax.axis('off') on it:

@Climbali - Thanks for the answer. Learnt something new today. Followup Q: How to annotate data labels in plot()? (% and #). % would be more useful. — Scott Grammilo, Jun 14 '21 at 23:11
You have a lot of examples here: https://matplotlib.org/stable/gallery/ticks_and_spines/tick-formatters.html With % you could do `ax.yaxis.set_major_formatter(matplotlib.ticker.PercentFormatter(xmax=100))` — Cimbali, Jun 14 '21 at 23:14
@Climbail - I am getting an error saying: 'DataFrame' object has no attribute 'value_counts'. When I am running the code `plot_data` section. Panda version: '0.25.3' — Scott Grammilo, Jun 14 '21 at 23:57
Yes it’s from a newer version. You can upgrade or stick with `GroupBy([a, b]).size()` — Cimbali, Jun 15 '21 at 00:08

How to plot multiple subplots using for loop?

1 Answers1