I am new to Python and I have been playing with a dummy dataset to practice Python. Previously I had troubles with generating subplots, then plotting frequencies and proportions %, but now I have overcome them today. Now, I am struggling with fixing some of the cosmetic stuff, especially with legends and the plot titles.
Here's the reproducible code generating the whole dummy dataset:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
Below is my attempt to produce frequencies and proportions %, for a side by side comparison:
plt.suptitle("Distribution of target variable across the categorical variables - frequencies # and proportions %")
nrow = 1
ncol = len(categoricals[:-1])
figure, axes = plt.subplots(nrow,ncol, figsize = (40,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
# plots frequencies
CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i,columns = 'ReversedPayment').plot(kind = 'bar', stacked = True, ax = ax, sharey=True)
ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)
ax.set_title(i,fontsize = 30)
ax.set_xlabel("")
#ax.get_legend().remove()
# labels data
for p in ax.patches:
x_adjust = 0.25
value = p.get_height()
X = p.get_x() + x_adjust
Y = p.get_y() + p.get_height()/2
XY = (X,Y)
if value != 0:
ax.annotate(int(value),XY,fontsize = 25)
nrow = 1
ncol = len(categoricals[:-1])
figure, axes = plt.subplots(nrow,ncol, figsize = (40,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
# plots proportions
CarWash.groupby([i,'ReversedPayment']).size().reset_index().pivot(index = i, columns = 'ReversedPayment').apply(lambda x: x/x.sum(),axis=1).plot(kind = 'bar', stacked = True, ax = ax)
ax.tick_params(axis='both', labelsize = 25, labelrotation = 0)
ax.set_title(i,fontsize = 30)
ax.set_xlabel("")
#ax.get_legend().remove()
# labels data
for p in ax.patches:
x_adjust = 0.25/3
value = p.get_height()
X = p.get_x() + x_adjust
Y = p.get_y() + p.get_height()/2
XY = (X,Y)
if value != 0:
ax.annotate(str(round(value*100,1)) + "%",XY,fontsize = 25)
And below is my output:
So, my problem with cosmetic coding is that:
- Title: I tried bringing one main title using
plt.suptitle()
but that is not working as desired (I see no output). Also tried other things but everything else threw errors. - Legend: Legends look ugly and I do not need them for all the subplots. I am trying to get just one legend for all the subplots and that would be great and a space saver. I tried something like
plt.legend([CarWash[i],CarWash['ReversedPayment']], ['Blue', 'Orange'])
but it didn't work.
Any comments / suggestions are most welcome and much appreciated. Thank you.