10

I am trying to run the code:

perm = PermutationImportance(clf).fit(X_test, y_test)
eli5.show_weights(perm)

to get an idea of which features are the most important in a model, but the output is

<IPython.core.display.HTML object> 

Any solutions or workarounds to this problem?

Thank you for your suggestions!

Evan
  • 373
  • 2
  • 3
  • 15

4 Answers4

9

(Spyder maintainer here) There are no workarounds nor solutions available at the moment (February 2019) to display web content in our consoles, sorry.

Note: We are considering how to make this possible, but most probably it won't be available until 2023.

Carlos Cordoba
  • 33,273
  • 10
  • 95
  • 124
5

A kiudge is to just display the HTML:

with open('C:\Temp\disppage.htm','wb') as f:   # Use some reasonable temp name
    f.write(htmlobj.html.encode("UTF-8"))

# open an HTML file on my own (Windows) computer
url = r'C:\Temp\disppage.htm'
webbrowser.open(url,new=2)
J Hudock
  • 51
  • 1
  • 2
4

Thanks for the idea J Hudok. The following is my working example

from sklearn.datasets import load_iris
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import eli5
from eli5.sklearn import PermutationImportance
from sklearn.model_selection import train_test_split
import webbrowser

# Load iris data & convert to dataframe
iris_data = load_iris()
data = pd.DataFrame({
    'sepal length': iris_data.data[:,0],
    'sepal width': iris_data.data[:,1],
    'petal length': iris_data.data[:,2],
    'petal width': iris_data.data[:,3],
    'species': iris_data.target
})
X = data[['sepal length', 'sepal width', 'petal length', 'petal width']]
y = data['species']

# Split train & test dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Initialize classifier
clf = RandomForestClassifier(n_estimators=56, max_depth=8, random_state=1, verbose=1)
clf.fit(X_train, y_train)

# Compute permutation feature importance
perm_importance = PermutationImportance(clf, random_state=0).fit(X_test, y_test)

# Store feature weights in an object
html_obj = eli5.show_weights(perm_importance, feature_names = X_test.columns.tolist())

# Write html object to a file (adjust file path; Windows path is used here)
with open('C:\\Tmp\\Desktop\iris-importance.htm','wb') as f:
    f.write(html_obj.data.encode("UTF-8"))

# Open the stored HTML file on the default browser
url = r'C:\\Tmp\\Desktop\iris-importance.htm'
webbrowser.open(url, new=2)
0

I have found a solution for Spyder:

clf.fit(X_train, y_train)
onehot_columns = list(clf.named_steps['preprocessor'].named_transformers_['cat'].named_steps['onehot'].get_feature_names(input_features=categorical_features))
numeric_features_list = list(numeric_features)
numeric_features_list.extend(onehot_columns)
numeric_features_list = np.array(numeric_features_list)
selected_features_bool =list(clf.named_steps['feature_selection'].get_support(indices=False))
numeric_features_list = list(numeric_features_list[selected_features_bool])
eli5.format_as_dataframe(eli5.explain_weights(clf.named_steps['classification'], top=50, feature_names=numeric_features_list))

As a result it gave me an output in dataframe format:

0                       region_BAKI  0.064145
1           call_out_offnet_dist_w1  0.025365
2                         trf_Bolge  0.022637
3            call_in_offnet_dist_w1  0.018974
4        device_os_name_Proprietary  0.018608

...