I have a random forest classifier which gave me a feature importance rank.
How can I derive statistical significance of the important features, similar to a regression model where you can infer statistical significance of the betas?
I have a random forest classifier which gave me a feature importance rank.
How can I derive statistical significance of the important features, similar to a regression model where you can infer statistical significance of the betas?
Your question is a bit too broad and unclear.
An easy way you can look at the feature_importance_
values as percentage is by normalizing their values:
importance_sum = sum(clf. feature_importances_)
feature_importance_as_percent = [100*(x/sum) for x in clf.feature_importances_]
Other methods would involve parametric or non-parametric tests.
Read also this: How are feature_importances in RandomForestClassifier determined?