I'm having difficulty loading an XGBoost regression with both pickle and joblib.
One difficulty could be the fact I am writing the pickle/joblib on a Windows desktop, but I am trying to load on a Macbook Pro
I attempted to use this solution previously posted: Python 3 - Can pickle handle byte objects larger than 4GB?
however, it still does not work. I will get a variety of errors, but usually something like:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
have also tried using protocol=4 in a pickle and joblib dump and in each instance, the file was still unable to load.
The files trying to be loaded have been anywhere from 2gb to 11gb based on joblib/pickle or using the bytes_in/os.path solution previously posted
Does anyone know a solution for optimal ways to write large XGBoost regressions, and/or how to then load them?
Here is the code used to write the XGBoost
dmatrix_train = xgb.DMatrix(
X_train.values, y_train, feature_names=X_train.columns.values
)
dmatrix_validate = xgb.DMatrix(
X_test.values, y_test, feature_names=X_train.columns.values
)
eval_set = [(dmatrix_train,"Train")]
eval_set.append((dmatrix_validate,"Validate"))
print("XGBoost #1")
params = {
'silent': 1,
'tree_method': 'auto',
'max_depth': 10,
'learning_rate': 0.001,
'subsample': 0.1,
'colsample_bytree': 0.3,
# 'min_split_loss': 10,
'min_child_weight': 10,
# 'lambda': 10,
# 'max_delta_step': 3
}
num_round = 500000
xgb_model = xgb.train(params=params, dtrain=dmatrix_train,evals=eval_set,
num_boost_round=num_round, verbose_eval=100)
joblib.dump(xgb_model, 'file.sav', protocol=4)
The final line has also been tried with standard pickle dumping as well, with 'wb' and without.