7

Context

I am trying to use a custom loss function for an XGBoost binary classifier.

The idea was to implement in XGBoost the soft-Fbeta loss, which I read about here. Simply put: instead of using the standard logloss, use a loss function that directly optimises the Fbeta score.

Caveat

Of course, the Fbeta itself is not differentiable, so it can't be used straight out of the box. However, the idea is to use the probabilities (hence, before thresholding) to create some sort of continuous TP, FP and FN. Find more details in the referenced Medium article.

Attempt

My attempt was the following (inspired by few different people).

import numpy as np
import xgboost as xgb

def gradient(y: np.array, p: np.array, beta: float):

    """Compute the gradient of the loss function. y is the true label, p
    the probability predicted by the model """
    
    # Define the denominator
    D = p.sum() + beta**2 * y.sum() 
    
    # Compute the gradient
    grad = (1 + beta**2) * y / D - (1 + beta**2) * (np.dot(p, y)) / D**2 
        
    return grad

def hessian(y: np.array, p: np.array, beta: float):

    """Compute the Hessian of the loss function. y is the true label, p
    the probability predicted by the model """
    
    # Define the denominator
    D = p.sum() + beta**2 * y.sum() 
    
    # Tensor sum y_i + y_j
    tensor_sum = y + y[:, None]
    
    # Compute the hessian
    hess = (1 + beta**2) / D**2 * (-tensor_sum + 2*np.dot(p, y) / D)
    
    return hess

def f_smooth_loss(beta: float):
    
    """ Custom loss function for maximising F score"""
    def custom_loss(y: np.array, p: np.array):
                
        # Actual custom loss
        b = beta
        
        # Compute grad
        grad = - gradient(y, p, b)
        
        # Compute hessian
        hess = - hessian(y, p, b)
                  
        return grad, hess
        
    return custom_loss

# Random train dataset
X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 2, 100)

# Random validation dataset
X_validation = np.random.rand(1000, 100)
y_validation = np.random.randint(0, 2, 1000)

# Define a classifier trying to maximise F5 score
model = xgb.XGBClassifier(objective=f_smooth_loss(5))

# Fit
model.fit(X_train, y_train,  eval_set=[(X_train, y_train), (X_validation, y_validation)])

Output

The model runs, but the output is apparently stuck, no matter what:

[0] validation_0-logloss:0.69315    validation_1-logloss:0.69315
[1] validation_0-logloss:0.69315    validation_1-logloss:0.69315
[2] validation_0-logloss:0.69315    validation_1-logloss:0.69315
[3] validation_0-logloss:0.69315    validation_1-logloss:0.69315

Comments

  1. It is possible my derivatives are not correct, even though I double checked them. However, even changing the grad and hess to constant numbers, nothing changes.

  2. The Hessian here is a matrix (which would be its mathematical definition), but I think XGBoost expects a 1D array (I think it is the diagonal). However, because of point 1., nothing changes even if I change it to a 1d-array

  3. Essentially, this model always predicts zeros, and does not update at all.

  4. Changing the size of the (fake) dataset does not lead to any change in the logloss (even more, the numbers are exactly the same).

  5. Curiously, the logloss is the same in the validation and train, this being yet another signal that there is something deeply wrong somewhere.

  6. If I switch to the standard logloss (built-in), it updates (outputs are random, as the dataset is random).

Question

What is wrong in my implementation? XGB docs are pretty hard to decipher, and I can't really tell if I am missing a simple building block here.

JIST
  • 1,139
  • 2
  • 8
  • 30
GiacomoP
  • 83
  • 3
  • 1
    `model = xgb.XGBClassifier(scoring = f_smooth_loss(5))`?? – Enrique Pérez Herrero Jul 20 '23 at 17:38
  • This one works! I didn't find any particular reference in the docs though. Could you expand on your answer? – GiacomoP Jul 21 '23 at 08:05
  • 1
    That's wrong, if you don't specify an objective it will use the default (which is why it looks like it's working). – Gijs Wobben Jul 21 '23 at 09:58
  • Please check my answer – Enrique Pérez Herrero Jul 21 '23 at 20:24
  • Copied and run this code. I do not understand either why the loss is stuck. Is it input/output data that is wrong? Is it this specific function that will not work with this data? Is it wrong version of the library? Looking into non-python default objectives does not enlighten much. Are there any other examples that work with custom objective written in python? – Vladimir Protsenko Jul 23 '23 at 21:26

2 Answers2

2

The problem is that following the docs the custom loss function need the following parameters as input:


....


def f_smooth_loss(beta: float):
    
    """ Custom loss function for maximising F score"""
    def custom_loss(
        predt: np.ndarray,
        dtrain: xgb.DMatrix
    ) -> Tuple[np.ndarray, np.ndarray]:
                
        # Actual custom loss
        b = beta
        
        # Compute grad
        grad = - gradient(dtrain, predt, b)
        
        # Compute hessian
        hess = - hessian(dtrain, predt, b)
                  
        return grad, hess
        
    return custom_los


Update: following the documentation referenced about it seems that you need to pass the function in the .train() of the class not when initializing the model, e.g.:

xgb.train({'tree_method': 'hist', 'seed': 1994},  # any other tree method is fine.
           dtrain=dtrain,
           num_boost_round=10,
           obj=f_smooth_loss(5))

Also, notice that the .fit() method is a wrapper that XGBoost has as a interface to interact with other sklearn objects (e.g. sklearn.pipeline) so it might lack this functionality, so it's better to use the native method .train().

Jose
  • 632
  • 1
  • 13
  • This is not working for me - I get exactly the same as before. However, changing from objective=f_smooth_loss(5) to scoring=f_smooth_loss(5), with either my previous definition or yours, does indeed make the loss move. – GiacomoP Jul 21 '23 at 08:02
  • Try using the method `.train()` instead of fit and pass there the loss function as the parameter `obj`. Check it in the updated answer. – Jose Jul 21 '23 at 11:51
  • I tried indeed. Up to minor modifications (due to the introduction of DMatrix) I still get a stationary prediction, fixed to 0.5. Also, I simplified the mock dataset, which now looks like `X_train = np.concatenate([np.ones((5, 5)), np.zeros((5, 5))])` and `y_train = np.concatenate([np.ones((5)), np.zeros((5))])`. It should be straightforward for the system to get it right. Also, the gradient is not zero and neither is the Hessian. – GiacomoP Jul 21 '23 at 12:39
  • Then maybe something in the formulation is wrong. If I have time this week I will try to check it out – Jose Jul 24 '23 at 07:54
0

Please change the classifier from objective=f_smooth_loss(5) to scoring=f_smooth_loss(5):

model = xgb.XGBClassifier(scoring = f_smooth_loss(5))
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33
  • `scoring` isn't even an allowed argument.. By not setting the objective it will use the default objective function for classification tasks. – Gijs Wobben Jul 24 '23 at 09:14