I want to plot the top n features in RandomForestClassifier()
in bokeh without specifying the column name explicitly in the y variable.
So firstly, instead of typing the column name in variable y, it can take the column name and value directly from the top feature of the randomclassifier.
y = df['new'] x = df.drop('new', axis=1) rf = RandomForestClassifier() rf.fit(x,y) #Extract the top feature from above and plot in bokeh source = ColumnDataSource(df) p1 = figure(y_range=(0, 10)) # below I would like it to use the top feature in RandomClassifier # instead of explicitly writing the column name, horsePower, # from the top features column p1.line( x = 'x', y = 'horsePower', source=source, legend = 'Car Blue', color = 'Blue' )
Instead of specifying the first feature only, or the second feature only, we can build a
for
loop that plots the n top features in bokeh. I imagine it to be something close to thisfor i in range(5): p.line(x = 'x', y = ???? , source=source,) #top feature in randomClassifier p.circle(x = 'x', y = ???? , source=source, size = 10) row = [p] output_file('TopFeatures') show(p)
I have already extracted the top 15 features from the RandomForestClassifier of the model and printed the first 15 using
new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False)
print(new_rf[:15])