for loop to plot the top n features importance in bokeh in python without explicitly typing the column names

Question

I want to plot the top n features in RandomForestClassifier() in bokeh without specifying the column name explicitly in the y variable.

So firstly, instead of typing the column name in variable y, it can take the column name and value directly from the top feature of the randomclassifier.

y = df['new']
x = df.drop('new', axis=1)
rf = RandomForestClassifier()
rf.fit(x,y)

#Extract the top feature from above and plot in bokeh

source = ColumnDataSource(df)

p1 = figure(y_range=(0, 10))

# below I would like it to use the top feature in RandomClassifier 
# instead of explicitly writing the column name, horsePower,
# from the top features column

p1.line(
    x = 'x',
    y = 'horsePower', 
    source=source,
    legend = 'Car Blue',
    color = 'Blue'
 )

Instead of specifying the first feature only, or the second feature only, we can build a for loop that plots the n top features in bokeh. I imagine it to be something close to this

for i in range(5):
    p.line(x = 'x', y = ???? , source=source,) #top feature in randomClassifier
    p.circle(x = 'x', y = ???? , source=source, size = 10)
    row = [p]

output_file('TopFeatures')
show(p)

I have already extracted the top 15 features from the RandomForestClassifier of the model and printed the first 15 using

 new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False) 

print(new_rf[:15])

Parfait · Accepted Answer · 2018-02-06T16:31:52.613

0

Simply iterate through the index values of pandas series, new_rf, since its index is column names:

# TOP 1 FEATURE
p1.line(
    x = 'x',
    y = new_rf.index[0], 
    source = source,
    legend = 'Car Blue',
    color = 'Blue'
 )

# TOP 5 FEATURES
for i in new_rf[:5].index:

    output_file("TopFeatures_{}".format(i))

    p = figure(y_range=(0, 10))
    p.line(x = 'x', y = i, source = source)
    p.circle(x = 'x', y = i, source = source, size = 10)

    show(p)

edited Feb 06 '18 at 16:31

answered Feb 06 '18 at 16:25

Parfait

104,375
17
94
125

sorry I had a small question. How can I plot the top 5 'new_rf' in the same plot ? The x axis must have the name of the feature and the y value is the feature importance. When I print 'new_rf', I get 2 columns, first is the name and the second is the feature importance value which is correct. and it is of type 'pandas.core.series.Series'. – Feb 12 '18 at 10:50
1

Look into bokeh's [multi_line()](https://stackoverflow.com/questions/31520951/plotting-multiple-lines-with-bokeh-and-pandas). Maybe `p.multi_line(df[col for col in df.columns if col in new_rf[:5].index])` – Parfait Feb 14 '18 at 19:21

for loop to plot the top n features importance in bokeh in python without explicitly typing the column names

1 Answers1