17

This is in reference to understanding, internally, how the probabilities for a class are predicted using LightGBM.

Other packages, like sklearn, provide thorough detail for their classifiers. For example:

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.

There are additional Stack Overflow questions which provide additional details, such as for:

I am trying to uncover those same details for LightGBM's predict_proba function. The documentation does not list the details of how the probabilities are calculated.

The documentation simply states:

Return the predicted probability for each class for each sample.

The source code is below:

def predict_proba(self, X, raw_score=False, start_iteration=0, num_iteration=None,
                  pred_leaf=False, pred_contrib=False, **kwargs):
    """Return the predicted probability for each class for each sample.

    Parameters
    ----------
    X : array-like or sparse matrix of shape = [n_samples, n_features]
        Input features matrix.
    raw_score : bool, optional (default=False)
        Whether to predict raw scores.
    start_iteration : int, optional (default=0)
        Start index of the iteration to predict.
        If <= 0, starts from the first iteration.
    num_iteration : int or None, optional (default=None)
        Total number of iterations used in the prediction.
        If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
        otherwise, all iterations from ``start_iteration`` are used (no limits).
        If <= 0, all iterations from ``start_iteration`` are used (no limits).
    pred_leaf : bool, optional (default=False)
        Whether to predict leaf index.
    pred_contrib : bool, optional (default=False)
        Whether to predict feature contributions.

        .. note::

            If you want to get more explanations for your model's predictions using SHAP values,
            like SHAP interaction values,
            you can install the shap package (https://github.com/slundberg/shap).
            Note that unlike the shap package, with ``pred_contrib`` we return a matrix with an extra
            column, where the last column is the expected value.

    **kwargs
        Other parameters for the prediction.

    Returns
    -------
    predicted_probability : array-like of shape = [n_samples, n_classes]
        The predicted probability for each class for each sample.
    X_leaves : array-like of shape = [n_samples, n_trees * n_classes]
        If ``pred_leaf=True``, the predicted leaf of every tree for each sample.
    X_SHAP_values : array-like of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
        If ``pred_contrib=True``, the feature contributions for each sample.
    """
    result = super(LGBMClassifier, self).predict(X, raw_score, start_iteration, num_iteration,
                                                 pred_leaf, pred_contrib, **kwargs)
    if callable(self._objective) and not (raw_score or pred_leaf or pred_contrib):
        warnings.warn("Cannot compute class probabilities or labels "
                      "due to the usage of customized objective function.\n"
                      "Returning raw scores instead.")
        return result
    elif self._n_classes > 2 or raw_score or pred_leaf or pred_contrib:
        return result
    else:
        return np.vstack((1. - result, result)).transpose()

How can I understand how exactly the predict_proba function for LightGBM is working internally?

artemis
  • 6,857
  • 11
  • 46
  • 99

2 Answers2

14

LightGBM, like all gradient boosting methods for classification, essentially combines decision trees and logistic regression. We start with the same logistic function representing the probabilities (a.k.a. softmax):

P(y = 1 | X) = 1/(1 + exp(Xw))

The interesting twist is that the feature matrix X is composed from the terminal nodes from a decision tree ensemble. These are all then weighted by w, a parameter that must be learned. The mechanism used to learn the weights depends on the precise learning algorithm used. Similarly, the construction of X also depends on the algorithm. LightGBM, for example, introduced two novel features which won them the performance improvements over XGBoost: "Gradient-based One-Side Sampling" and "Exclusive Feature Bundling". Generally though, each row collects the terminal leafs for each sample and the columns represent the terminal leafs.

So here is what the docs could say...

Probability estimates.

The predicted class probabilities of an input sample are computed as the softmax of the weighted terminal leaves from the decision tree ensemble corresponding to the provided sample.

For further details, you'd have to delve into the details of boosting, XGBoost, and finally the LightGBM paper, but that seems a bit heavy handed given the other documentation examples you've given.

fny
  • 31,255
  • 16
  • 96
  • 127
5

Short Explanation

Below we can see an illustration of what each method is calling under the hood. First, the predict_proba() method of the class LGBMClassifier is calling the predict() method from LGBMModel (it inherits from it).

LGBMClassifier.predict_proba() (inherits from LGBMModel)
  |---->LGBMModel().predict() (calls LightGBM Booster)
          |---->Booster.predict()

Then, it calls the predict() method from the LightGBM Booster (the Booster class). In order to call this method, the Booster should be trained first.

Basically, the Booster is the one that generates the predicted value for each sample by calling it's predict() method. See below, for a detailed follow up of how this booster works.

Detailed Explanation or How does the LightGBM Booster works?

We seek to answer the question how does LightGBM booster works?. By going through the Python code we can get a general idea of how it is trained and updated. But, there are some further references to the C++ libraries of LightGBM that I'm not in a position to explain. However, a general glimpse of LightGBM's Booster workflow is explained.

A. Initializing and Training the Booster

The _Booster of LGBMModel is initialized by calling the train() function, on line 595 of sklearn.py we see the following code

self._Booster = train(params, train_set,
                      self.n_estimators, valid_sets=valid_sets, valid_names=eval_names,
                      early_stopping_rounds=early_stopping_rounds,
                      evals_result=evals_result, fobj=self._fobj, feval=feval,
                      verbose_eval=verbose, feature_name=feature_name,
                      callbacks=callbacks, init_model=init_model)

Note. train() comes from engine.py.

Inside train() we see that the Booster is initialized (line 231)

# construct booster
try:
    booster = Booster(params=params, train_set=train_set)
...

and updated at every training iteration (line 242).

for i in range_(init_iteration, init_iteration + num_boost_round):
     ...
     ... 
     booster.update(fobj=fobj)
     ...

B. How does booster.update() works?

To understand how the update() method works we should go to line 2315 of basic.py. Here, we see that this function updates the Booster for one iteration.

There two alternatives to update the booster, depending on wether or not you provide an objective function.

  • Objective Function is None

On line 2367 we get to the following code

if fobj is None:
    ...
    ...
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
               self.handle,
               ctypes.byref(is_finished)))
    self.__is_predicted_cur_iter = [False for _ in range_(self.__num_dataset)]
    return is_finished.value == 1

notice that as the objective function (fobj) is not provided it updates the booster by calling LGBM_BoosterUpdateOneIter from _LIB. For short, _LIB are the loaded C++ LightGBM libraries.

What is _LIB?

_LIB is a variable that stores the loaded LightGBM library by calling _load_lib() (line 29 of basic.py).

Then _load_lib() loads the LightGBM library by finding on your system the path to lib_lightgbm.dll(Windows) or lib_lightgbm.so (Linux).

  • Objective Function provided

When a custom object function is encountered, we get to the following case

else:
    ...
    ...
    grad, hess = fobj(self.__inner_predict(0), self.train_set)

where __inner_predict() is a method from LightGBM's Booster (see line 1930 from basic.py for more details of the Booster class), which predicts for training and validation data. Inside __inner_predict() (line 3142 of basic.py) we see that it calls LGBM_BoosterGetPredict from _LIB to get the predictions, that is,

_safe_call(_LIB.LGBM_BoosterGetPredict(
                self.handle,
                ctypes.c_int(data_idx),
                ctypes.byref(tmp_out_len),
                data_ptr))

Finally, after updating range_(init_iteration, init_iteration + num_boost_round) times the booster it will be trained. Thus, Booster.predict() can be called by LightGBMClassifier.predict_proba().

Note. The booster is trained as part of the model fitting step, especifically by LGBMModel.fit(), see line 595 of sklearn.py for code details.

Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49
  • Thanks for tracing the code back....is there any way to understand what this means? Again, looking for an explanation of how the probabilties are calculated, as specified in the question, and in the 4 examples posted. – artemis Aug 27 '20 at 03:00
  • @wundermahn, it could even go further down to understand the C++ code behind LGBM_BoosterGetPredict. I'm not decent at C. Is that the kind of explanation you're looking for (the c source code)? – Miguel Trejo Aug 27 '20 at 14:01
  • Yeah, I think a lot of these libraries are ultimately written in `C++` or `C`. The intention is to have an answer that describes how the probabilities are calculated; not necessarily the logic or code flow. There are several examples of questions that have provided answers similar to what I am looking for above. The best, in my opinion, was for SVM[1] [https://stackoverflow.com/questions/15111408/how-does-sklearn-svm-svcs-function-predict-proba-work-internally]. – artemis Aug 27 '20 at 14:53