From this Tutorial and Feature Importance
I try to make my own random forest tree
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
X = df.loc[:, df.columns != 'target']
y = df.loc[:, 'target'].values
X_train, X_test, Y_train, Y_test = train_test_split(X, y, random_state=0)
rf = RandomForestClassifier(n_estimators=1,
max_depth=2,
max_features=2,
random_state=0)
rf.fit(X_train, Y_train)
rf.feature_importances_
array([0. , 0.11197953, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.88802047, 0. , 0. , 0. ])
fn=data.feature_names
cn=data.target_names
fig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,4), dpi=800)
tree.plot_tree(rf.estimators_[0],
feature_names = fn,
class_names=cn,
filled = True);
fig.savefig('rf_individualtree.png')
calculate the Feature Importance by hand from above Feature Importance (result from sklearn 0.11197953, 0.88802047)
a = (192/265)*(0.262-(68/192)*0.452-(124/192)*0.103)
b = (265/265)*(0.459-(192/265)*0.262-(73/265)*0.185)+(73/265)*(0.185-(72/73)*0.173)
print(b/(a+b))
print(a/(a+b))
0.8625754868011606
0.13742451319883947
Which part I did wrong my result is different from sklearn answer or sklearn just don't follow the formula?