I use the following function to plot feature importance of my model.
def plot_feature_importance(imp_df):
imp_df.columns = ['feature', 'feature_importance']
plt.figure(figsize=(15,16))
b = sns.barplot(x = 'feature_importance', y ='feature', data = imp_df, orient = 'h', color = 'royalblue')
b.set_xlabel("feature importance", fontsize=30)
b.set_ylabel("feature", fontsize=30)
b.tick_params(labelsize=26)
plt.title("Random Forest feature importace", fontsize=35)
#plt.xlabel('feature_importances', fontsize=16)
#plt.ylabel('feature', fontsize=16)
plt.tight_layout()
And I call it whensoever I plot feature importance.
But the problem is that I want to maintain consistent scaling on the x-axis
. So for example, calling the function on these two models.
data1 =\
{'feature': ['sp_max', 'sp_p85', 'sp_mean', 'sp_std', 'sp_p75', 'sp_iqr', 'sp_median', 'sp_mad', 'sp_p25', 'br_p85', 'jk_std', 'ac_std', 'sp_min',
'jk_min', 'jk_p85', 'ac_mad', 'jk_max', 'jk_p75', 'ac_p85', 'br_p75', 'br_iqr', 'ac_mean', 'br_std', 'ac_max', 'jk_iqr', 'ac_min',
'br_mean', 'ac_p25', 'ac_iqr', 'jk_mad', 'ac_p75', 'jk_mean', 'br_max', 'jk_p25', 'jk_median', 'br_median', 'ac_median', 'br_mad', 'br_p25', 'br_min'],
'feature_importance': [0.0713905329, 0.0614052325, 0.0537031799, 0.0522208471, 0.0447700292, 0.0365220941, 0.0338786688, 0.0315128429,
0.0313193327, 0.0302328929, 0.0277040796, 0.0250055989, 0.0246700799, 0.0241926779, 0.0231924589, 0.0230042172,
0.0217812924, 0.0211596946, 0.0209208033, 0.020761991, 0.0199109668, 0.0195804683, 0.0194071674, 0.0192247004,
0.0191562826, 0.0190183126, 0.0187932756, 0.0187916575, 0.0186561271, 0.0185529459, 0.0182522133, 0.0178571215,
0.0172595608, 0.0161357392, 0.0135001044, 0.0129659029, 0.0121831595, 0.0114491594, 0.00994391099, 1.2677449e-05]}
feature_importance1 = pd.DataFrame(data1)
# function call
plot_feature_importance(feature_importance1)
And a call to the second model:
data2 =\
{'feature': ['trip-distance', 'sp_max', 'sp_p85', 'sp_std', 'sp_mean', 'sp_p75', 'sp_iqr', 'sp_median', 'br_p85', 'sp_mad', 'sp_p25',
'jk_std', 'jk_p85', 'sp_min', 'ac_std', 'jk_min', 'ac_mad', 'jk_p75', 'br_p75', 'ac_p85', 'br_iqr', 'jk_mad', 'jk_max',
'jk_iqr', 'ac_p25', 'ac_p75', 'ac_iqr', 'ac_mean', 'ac_min', 'br_std', 'ac_max', 'br_mean', 'jk_p25', 'jk_mean', 'br_max',
'jk_median', 'br_median', 'ac_median', 'br_mad', 'br_p25', 'br_min'],
'feature_importance': [0.179681943, 0.0626488215, 0.0548207093, 0.0465467404, 0.0462552278, 0.0393529557, 0.0310985549, 0.0298559465,
0.0257461545, 0.0254299245, 0.0247697614, 0.0215061763, 0.019717727, 0.0193190652, 0.0192614463, 0.0191363098,
0.0187974266, 0.0179852664, 0.0178322731, 0.0172595835, 0.0169402994, 0.0160361741, 0.016030292, 0.0159925654,
0.014725113, 0.0146868079, 0.0143054259, 0.0142270777, 0.0139906464, 0.0138775454, 0.0138539903, 0.0134992019,
0.0131361466, 0.0129263598, 0.0122852486, 0.0104203687, 0.0101486977, 0.00985276389, 0.00931697107, 0.00671582934,
1.04610595e-05]}
feature_importance2 = pd.DataFrame(data2)
# function call
plot_feature_importance(feature_importance2)
Perfect! But I want to maintain consistent scale, in a 2-decimal format (as in the first plot).