XGBoost Algorithm: Feature importance change after renaming columns

Question

I ran an XGBoost algorithm on my data and found that a certain 15 features are important. I rename the column in my data-frame and then again ran the same XGBoost algorithm and noticed a change in my important features.The order is slightly jumbled up in the matrix and 2-3 new variables are present. It is largely the same, but I was wondering what could have caused this change in the feature importance considering I have changed only col names. I used tree shap to find feature importance, and below is how i renamed the columns.

colnames = pd.read_csv("kbmg_colnames.csv")
d = dict(zip(colnames['Actual'], colnames['To be changed']))
Data_test = Data_test.rename(columns=d)

Yes, I am setting a random state. I am running the 2 dataframe(after and before column change ) through the exact same piece of code. — Anjala Abdurehman, Aug 13 '19 at 07:48

score 0 · Answer 1 · answered Aug 13 '19 at 07:41

0

Almost every ML Algorithm has a random_state in it.

random_state : int, RandomState instance or None, optional (default=None)

    If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

To get the same result in every run, you have to set it to some number: random_state=42. This is highly recommended for every ML task.

Random state (Pseudo-random number) in Scikit learn

answered Aug 13 '19 at 07:41

PV8

5,799
7
43
87

I have set the random state to 42. – Anjala Abdurehman Aug 13 '19 at 08:03
The code that I run is exactly the same,and I set a random state to 42. The only change I make is renaming the columns. This really baffles me! – Anjala Abdurehman Aug 13 '19 at 08:06

XGBoost Algorithm: Feature importance change after renaming columns

1 Answers1