1

I have trained an XGBoost binary classifier and I would like to extract features importance for each observation I give to the model (I already have global features importance).

More specifically, I am looking for a way to determine, for each instance given to the model, which features have the most impact and make the input belong to one class or another. I would like to know something like the top 5 features which make the observation belong to some class and indications on how I should modify these 5 features so that the probability of belonging to this class decreases or increases.

For example, let’s say my model predicts whether a house costs more than 100,000 dollars (this is the positive class) based on its location, surface and number of bedrooms. I give it the following input: London, 400 square foots, 4 bedrooms and my model predicts a probability of 56% for the house to be in the positive class. I am looking for a Python module or a function that would show the most influential features for each observation.

Ismalyt
  • 59
  • 8

1 Answers1

3

There are several different methods for that. You can use native importance measures from xgboost library. Check this answer: https://stackoverflow.com/a/51645066/3733974

You can also look for alternative methods. Here are two of them I can recommend:

mbh86
  • 6,078
  • 3
  • 18
  • 31