0

I use the following function to plot feature importance of my model.

def plot_feature_importance(imp_df):
    imp_df.columns = ['feature', 'feature_importance']
    plt.figure(figsize=(15,16))
    b = sns.barplot(x = 'feature_importance', y ='feature', data = imp_df, orient = 'h', color = 'royalblue') 
    b.set_xlabel("feature importance", fontsize=30)
    b.set_ylabel("feature", fontsize=30)
    b.tick_params(labelsize=26)

    plt.title("Random Forest feature importace", fontsize=35)
    #plt.xlabel('feature_importances', fontsize=16)
    #plt.ylabel('feature', fontsize=16)
    plt.tight_layout()

And I call it whensoever I plot feature importance.

But the problem is that I want to maintain consistent scaling on the x-axis. So for example, calling the function on these two models.

data1 =\
{'feature': ['sp_max', 'sp_p85', 'sp_mean', 'sp_std', 'sp_p75', 'sp_iqr', 'sp_median', 'sp_mad', 'sp_p25', 'br_p85', 'jk_std', 'ac_std', 'sp_min',
             'jk_min', 'jk_p85', 'ac_mad', 'jk_max', 'jk_p75', 'ac_p85', 'br_p75', 'br_iqr', 'ac_mean', 'br_std', 'ac_max', 'jk_iqr', 'ac_min',
             'br_mean', 'ac_p25', 'ac_iqr', 'jk_mad', 'ac_p75', 'jk_mean', 'br_max', 'jk_p25', 'jk_median', 'br_median', 'ac_median', 'br_mad', 'br_p25', 'br_min'],
 'feature_importance': [0.0713905329, 0.0614052325, 0.0537031799, 0.0522208471, 0.0447700292, 0.0365220941, 0.0338786688, 0.0315128429,
                        0.0313193327, 0.0302328929, 0.0277040796, 0.0250055989, 0.0246700799, 0.0241926779, 0.0231924589, 0.0230042172,
                        0.0217812924, 0.0211596946, 0.0209208033, 0.020761991, 0.0199109668, 0.0195804683, 0.0194071674, 0.0192247004,
                        0.0191562826, 0.0190183126, 0.0187932756, 0.0187916575, 0.0186561271, 0.0185529459, 0.0182522133, 0.0178571215,
                        0.0172595608, 0.0161357392, 0.0135001044, 0.0129659029, 0.0121831595, 0.0114491594, 0.00994391099, 1.2677449e-05]}

feature_importance1 = pd.DataFrame(data1)

# function call
plot_feature_importance(feature_importance1)

enter image description here

And a call to the second model:

data2 =\
{'feature': ['trip-distance', 'sp_max', 'sp_p85', 'sp_std', 'sp_mean', 'sp_p75', 'sp_iqr', 'sp_median', 'br_p85', 'sp_mad', 'sp_p25',
             'jk_std', 'jk_p85', 'sp_min', 'ac_std', 'jk_min', 'ac_mad', 'jk_p75', 'br_p75', 'ac_p85', 'br_iqr', 'jk_mad', 'jk_max',
             'jk_iqr', 'ac_p25', 'ac_p75', 'ac_iqr', 'ac_mean', 'ac_min', 'br_std', 'ac_max', 'br_mean', 'jk_p25', 'jk_mean', 'br_max',
             'jk_median', 'br_median', 'ac_median', 'br_mad', 'br_p25', 'br_min'],
 'feature_importance': [0.179681943, 0.0626488215, 0.0548207093, 0.0465467404, 0.0462552278, 0.0393529557, 0.0310985549, 0.0298559465,
                        0.0257461545, 0.0254299245, 0.0247697614, 0.0215061763, 0.019717727, 0.0193190652, 0.0192614463, 0.0191363098,
                        0.0187974266, 0.0179852664, 0.0178322731, 0.0172595835, 0.0169402994, 0.0160361741, 0.016030292, 0.0159925654,
                        0.014725113, 0.0146868079, 0.0143054259, 0.0142270777, 0.0139906464, 0.0138775454, 0.0138539903, 0.0134992019,
                        0.0131361466, 0.0129263598, 0.0122852486, 0.0104203687, 0.0101486977, 0.00985276389, 0.00931697107, 0.00671582934,
                        1.04610595e-05]}

feature_importance2 = pd.DataFrame(data2)

# function call
plot_feature_importance(feature_importance2)

enter image description here

Perfect! But I want to maintain consistent scale, in a 2-decimal format (as in the first plot).

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Amina Umar
  • 502
  • 1
  • 9
  • For example: `b.yaxis.set_major_locator(matplotlib.ticker.MultipleLocator(0.01)))` on the line after `b.tick_params(labelsize=26)`. – JohanC Nov 10 '22 at 19:47
  • The option from @JohanC doesn't quite work, as shown in this [plot](https://i.stack.imgur.com/phxlG.png). However, `b.xaxis.set_major_formatter(tkr.FormatStrFormatter('%.2f'))` does, as shown in this [plot](https://i.stack.imgur.com/xxIU9.png). Where `import matplotlib.ticker as tkr` – Trenton McKinney Nov 11 '22 at 18:50
  • @TrentonMcKinney Wouldn't `FormatStrFormatter('%.2f')` change `0.025` to `0.03`? It is not clear whether that would be desirable. Maybe a better approach could be to calculate the factor for the MultipleLocator for the x-axis via something like `np.round((xmax - xmin)/10, 2)`? – JohanC Nov 11 '22 at 19:07
  • What @JohanC says in this [comment](https://stackoverflow.com/questions/74393008/how-do-i-apply-consistent-scaling-in-my-figure#comment131354596_74393008) is true. To be clear, the initial proposed answer doesn't work, simply because there are too many ticks / labels, so they overlap. – Trenton McKinney Nov 11 '22 at 19:18

0 Answers0