What is the difference between model.LGBMRegressor.fit(x_train, y_train) and lightgbm.train(train_data, valid_sets = test_data)?

Question

I tried out two ways of implementing light GBM. Expect it to return the same value but it didnt.

I thought lgb.LightGBMRegressor() and lgb.train(train_data, test_data) will return the same accuracy but it didnt. So I wonder why?

Function to break the data

def dataready(train, test, predictvar):
    included_features = train.columns
    y_test = test[predictvar].values
    y_train = train[predictvar].ravel()
    train = train.drop([predictvar], axis = 1)
    test = test.drop([predictvar], axis = 1)
    x_train = train.values
    x_test = test.values
    return x_train, y_train, x_test, y_test, train

This is how i break down the data

x_train, y_train, x_test, y_test, train2 = dataready(train, test, 'runtime.min')
train_data = lgb.Dataset(x_train, label=y_train)
test_data = lgb.Dataset(x_test, label=y_test)

predict model

lgb1 = LMGBRegressor()
lgb1.fit(x_train, y_train)
lgb = lgb.train(parameters,train_data,valid_sets=test_data,num_boost_round=5000,early_stopping_rounds=100)

I expect it to be roughly the same but it is not. As far as I understand, one is a booster and the other is a regressor?

My guess is, that `fit` is just the method used by the sklearn api of light gbm (to make light gbm usable in libraries built for sklearn) and `train` is the native method of lightgbm. So the difference is probably just caused by different default values. — jottbe, Aug 27 '19 at 10:59
I have the same issue, after testing 20 runs on each with same sets of hyperparameters, using sklearn fit alwasys give me better results, I don't understand why — ADJ, May 08 '20 at 16:02
Unfortunately, lightgbm support team is really weak and they have closed a discussion topic: https://github.com/microsoft/LightGBM/issues/2930. I have the same issue. I have not set any parameters in either of them, but I get a huge difference between the two APIs. — Moradnejad, Jan 31 '21 at 22:21

score 11 · Answer 1 · answered Apr 24 '20 at 15:35

LGBMRegressor is the sklearn interface. The .fit(X, y) call is standard sklearn syntax for model training. It is a class object for you to use as part of sklearn's ecosystem (for running pipelines, parameter tuning etc.).

lightgbm.train is the core training API for lightgbm itself.

XGBoost and many other popular ML training libraries have a similar differentiation (core API uses xgb.train(...) for example with sklearn API using XGBClassifier or XGBRegressor).

What is the difference between model.LGBMRegressor.fit(x_train, y_train) and lightgbm.train(train_data, valid_sets = test_data)?

Function to break the data

This is how i break down the data

predict model

1 Answers1