Thanks to @Laassairi Abdellah he was able to redirect me incremental training. Armed with that knowledge I've made this function:
import xgboost as xgb
import numpy as np
def fine_tune(model_, X, y, loop=False, num_boost_rounds=30, params=None):
"""
Fine-tune an XGBoost model using incremental training.
Args:
- model_: str, xgboost.core.Booster, path / object of the model to be fine-tuned.
- X: array-like, shape (n_samples, n_features), input data for training.
- y: array-like, shape (n_samples,), output (target) data for training.
- loop: bool, loop the training process until X predicts y perfectly.
- num_boost_rounds: int, number of boosting rounds.
- params: dict, parameters for the model.
Returns:
- model: the fine-tuned XGBoost model.
"""
if isinstance(model_, str):
# Load the existing model
model = xgb.Booster()
model.load_model(model_)
elif not isinstance(model_, xgb.Booster):
try:
model = model_.get_booster()
except:
raise ValueError("The model must be either a string to a file or an XGBoost model.")
if isinstance(model_, (xgb.Booster, str)):
assert params is not None, "The params argument must be provided when loading a model from a file or a Booster model."
param = params if params is not None else model_.get_params()
# Convert the input to DMatrix
dX = xgb.DMatrix(X, label=y)
# Train the model
model = xgb.train(param, dX, num_boost_rounds, xgb_model=model)
if loop:
# Loop the training process until the model predicts perfectly
while True:
y_pred = model.predict(dX)
y_pred = np.where(y_pred > 0.5, 1, 0)
if np.all(y_pred == y):
break
model = xgb.train(param, dX, num_boost_rounds, xgb_model=model)
if not isinstance(model_, (str, xgb.Booster)):
# Update the internal booster
model_._Booster = model
return model
The loop section of this code is specific to my use case of binary classification as in it is either 1 or 0.
Example usage:
fine_tune(model,
np.array([word2vec.get_mean_vector(tokenize(
"The delivery was a tiny bit late but the product was sleek and high quality"
))]), np.array([1]), loop=True
)