1

I ran an XGBoost algorithm on my data and found that a certain 15 features are important. I rename the column in my data-frame and then again ran the same XGBoost algorithm and noticed a change in my important features.The order is slightly jumbled up in the matrix and 2-3 new variables are present. It is largely the same, but I was wondering what could have caused this change in the feature importance considering I have changed only col names. I used tree shap to find feature importance, and below is how i renamed the columns.

colnames = pd.read_csv("kbmg_colnames.csv")
d = dict(zip(colnames['Actual'], colnames['To be changed']))
Data_test = Data_test.rename(columns=d)

1 Answers1

0

Almost every ML Algorithm has a random_state in it.

random_state : int, RandomState instance or None, optional (default=None)

    If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

To get the same result in every run, you have to set it to some number: random_state=42. This is highly recommended for every ML task.

Random state (Pseudo-random number) in Scikit learn

PV8
  • 5,799
  • 7
  • 43
  • 87