0

I would like to recover the feature names from my saved models, however it seems this info is lost when the model is saved. Steps:

  • train model
  • feature names exist
  • save model using Booster.save_model
  • load model using Booster.load_model
  • feature names of loaded model are None

Code to reproduce:


import xgboost as xgb

# Prep data
data = {'col_1': [3, 2, 1, 0], 'col_2': [1, 2, 3, 4], 'label': [1, 1, 0, 0]}
df = pd.DataFrame.from_dict(data)
y_train = df.pop('label')
x_train = df

# Train model
classifier = xgb.XGBClassifier(
        max_depth=3, 
        learning_rate=0.1, 
        n_estimators=3)
classifier.fit(x_train, y_train)
print(f'Trained, features: {classifier.get_booster().feature_names}')

# Save model
filename = 'test_model.bst'
classifier.save_model(filename)

# Load model
bst_model = xgb.Booster()
bst_model.load_model(filename)
print(f'Loaded: {bst_model.feature_names}')

How can I include the feature names in my model?

Paul
  • 1,939
  • 1
  • 18
  • 27

1 Answers1

1

Apparently it seems, when using save_model and load_model, the feature_names tend to disappear in the loaded model as mentioned here. However, you can follow this workaround to get back your original feature names:

bst_model = xgb.Booster()
bst_model.load_model(filename)
bst_model.feature_names = [df.columns[0], df.columns[1]] # Passing a list of your feature names

Although it's hard-coded but it's the only workaround I have seen so far. This thread would be helpful to get more insights.

Ro.oT
  • 623
  • 6
  • 15
  • Thanks for confirming. I had seen those questions you linked but I was hoping there would be some trick, it's a bit strange. If I remember after my vacation in going to raise this as a bug. – Paul Jul 20 '23 at 12:45