0

I am new to Python and I have been playing with a dummy dataset to practice Python. Previously I had troubles with generating subplots, then plotting frequencies and proportions %, but now I have overcome them today. Now, I am struggling with fixing some of the cosmetic stuff, especially with legends and the plot titles.

Here's the reproducible code generating the whole dummy dataset:

d = {
    'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] , 
    'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] , 
    'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] , 
    'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] , 
    'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] , 
    'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)


categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
        numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')

Below is my attempt to produce frequencies and proportions %, for a side by side comparison:

plt.suptitle("Distribution of target variable across the categorical variables - frequencies # and proportions %")

nrow = 1
ncol = len(categoricals[:-1])
figure, axes = plt.subplots(nrow,ncol, figsize = (40,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
    
    # plots frequencies
    CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i,columns = 'ReversedPayment').plot(kind = 'bar', stacked = True, ax = ax, sharey=True)
    ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)   
    ax.set_title(i,fontsize = 30)
    ax.set_xlabel("")
    #ax.get_legend().remove()
    # labels data
    for p in ax.patches:       
        x_adjust = 0.25
        
        value = p.get_height()
        X = p.get_x() + x_adjust
        Y = p.get_y() + p.get_height()/2
        
        XY = (X,Y)
        if value != 0:
            ax.annotate(int(value),XY,fontsize = 25)


nrow = 1
ncol = len(categoricals[:-1])
figure, axes = plt.subplots(nrow,ncol, figsize = (40,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
    
    # plots proportions    
    CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i, columns = 'ReversedPayment').apply(lambda x: x/x.sum(),axis=1).plot(kind = 'bar', stacked = True, ax = ax)
    ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)   
    ax.set_title(i,fontsize = 30)
    ax.set_xlabel("")
    #ax.get_legend().remove()
    # labels data
    for p in ax.patches:       
        x_adjust = 0.25/3
        
        value = p.get_height()
        X = p.get_x() + x_adjust
        Y = p.get_y() + p.get_height()/2
        
        XY = (X,Y)
        if value != 0:
            ax.annotate(str(round(value*100,1)) + "%",XY,fontsize = 25)

And below is my output:

enter image description here

So, my problem with cosmetic coding is that:

  1. Title: I tried bringing one main title using plt.suptitle() but that is not working as desired (I see no output). Also tried other things but everything else threw errors.
  2. Legend: Legends look ugly and I do not need them for all the subplots. I am trying to get just one legend for all the subplots and that would be great and a space saver. I tried something like plt.legend([CarWash[i],CarWash['ReversedPayment']], ['Blue', 'Orange']) but it didn't work.

Any comments / suggestions are most welcome and much appreciated. Thank you.

Scott Grammilo
  • 1,229
  • 4
  • 16
  • 37
  • try figure.suptitle("Distribution of target variable across the categorical variables - frequencies # and proportions %",fontsize=20) – virxen Jun 19 '21 at 01:23

2 Answers2

0

You execute plt.suptitle before creating the figure for the plots, and this is why the title does not appear in the final plot. See here for answers how to create a common legend for several subplots. Taking this together you can try the following:

nrow = 2
ncol = len(categoricals[:-1])
fig, axes = plt.subplots(nrow,ncol, figsize = (40,20))
plt.suptitle("Distribution of target variable across the categorical variables - frequencies # and proportions %", size=40, y=1.01)

for i,ax in zip(categoricals[:-1],axes.flatten()):
    
    # plots frequencies
    df = CarWash[[i,'ReversedPayment']].value_counts().unstack().fillna(0)
    ax.bar(df.index, df[0], label='0')
    ax.bar(df.index, df[1], bottom=df[0], label='1')
    ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)   
    ax.set_title(i,fontsize = 30, y=1.05)
    ax.set_xticks([0,1])
    
    # labels data
    for p in ax.patches:       
        x_adjust = 0.35
        
        value = p.get_height()
        X = p.get_x() + x_adjust
        Y = p.get_y() + p.get_height()/2
        
        XY = (X,Y)
        if value != 0:
            ax.annotate(int(value),XY,fontsize = 25)
                       
for i,ax in zip(categoricals[:-1],axes.flatten()[ncol:]):
    
    # plots proportions
    df = CarWash[[i,'ReversedPayment']].value_counts().unstack().fillna(0)
    df = (df.T/df.T.sum()).T*100
    ax.bar(df.index, df[0], label='0')
    ax.bar(df.index, df[1], bottom=df[0], label='1')
    ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)   
    ax.set_title(i,fontsize = 30, y=1.05)
    ax.set_xticks([0,1])
    ax.set_ylim(0,110)

    # labels data
    for p in ax.patches:       
        x_adjust = 0.25
        
        value = p.get_height()
        X = p.get_x() + x_adjust
        Y = p.get_y() + p.get_height()/2
        
        XY = (X,Y)
        if value != 0:
            ax.annotate(str(round(value,1)) + "%",XY,fontsize = 25)
            
plt.subplots_adjust(hspace=0.4)

# plot legend
handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc=(0.95, 0.1), prop={'size': 30})
plt.show()

It gives:

enter image description here

bb1
  • 7,174
  • 2
  • 8
  • 23
0

Perhaps you can move the title into the loop and add legend after iteration as following shown.

nrow = 1
ncol = len(categoricals[:-1])
figure, axes = plt.subplots(nrow,ncol, figsize = (20,5))


for i,ax in zip(categoricals[:-1],axes.flatten()):    
    # plots frequencies
    plt.suptitle("Distribution of target variable across the categorical variables - frequencies # and proportions %")
CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i,columns = 'ReversedPayment').plot(kind = 'bar', stacked = True, ax = ax, sharey=True)
    ax.tick_params(axis='both', labelsize = 15, labelrotation = 0)   
    ax.set_title(i,fontsize = 15)
    ax.set_xlabel("")
    legend = ax.legend()
    #ax.get_legend().remove()
    # labels data
    for p in ax.patches:       
    x_adjust = 0.25
    
        value = p.get_height()
        X = p.get_x() + x_adjust
        Y = p.get_y() + p.get_height()/2
        legend.remove()
    
        XY = (X,Y)
        if value != 0:
        ax.annotate(int(value),XY,fontsize = 15)    

ax.legend(loc=2, bbox_to_anchor=(-4.2,1.1),fontsize=10) 

The output plot is shown below, some adjustments still need to do to include the second row of subplots, and positioning/sizing. Figure1

For scientific plot involving many labels, legend, appendixes, you may use another Python-driven program, called Veusz. This is also very easy to use. Veusz website

Eureka JX
  • 66
  • 3