Custom Evaluation Function based on F1 for use in xgboost - Python API

Question

I have written the following custom evaluation function to use with xgboost, in order to optimize F1. Umfortuantely it returns an exception when run with xgboost.

The evaluation function is the following:

def F1_eval(preds, labels):

    t = np.arange(0, 1, 0.005)
    f = np.repeat(0, 200)
    Results = np.vstack([t, f]).T

    P = sum(labels == 1)

    for i in range(200):
        m = (preds >= Results[i, 0])
        TP = sum(labels[m] == 1)
        FP = sum(labels[m] == 0)

        if (FP + TP) > 0:
            Precision = TP/(FP + TP)

        Recall = TP/P

        if (Precision + Recall >0) :
            F1 = 2 * Precision * Recall / (Precision + Recall)                
        else:                
            F1 = 0

        Results[i, 1] = F1

    return(max(Results[:, 1]))

Below I provide a reproducible example along with the error message:

    from sklearn import datasets

    Wine = datasets.load_wine()

    X_wine = Wine.data
    y_wine = Wine.target

    y_wine[y_wine == 2] = 1

    X_wine_train, X_wine_test, y_wine_train, y_wine_test = train_test_split(X_wine, y_wine, test_size = 0.2)

    clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic', \
                      booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, \
                      subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)

    clf_wine.fit(X_wine_train, y_wine_train,\
    eval_set=[(X_wine_train, y_wine_train), (X_wine_test, y_wine_test)], eval_metric=F1_eval, early_stopping_rounds=10, verbose=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-453-452852658dd8> in <module>()
     12 clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic',                   booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0,                   subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)
     13 
---> 14 clf_wine.fit(X_wine_train, y_wine_train,eval_set=[(X_wine_train, y_wine_train), (X_wine_test, y_wine_test)], eval_metric=F1_eval, early_stopping_rounds=10, verbose=True)
     15 

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set)
    519                               early_stopping_rounds=early_stopping_rounds,
    520                               evals_result=evals_result, obj=obj, feval=feval,
--> 521                               verbose_eval=verbose, xgb_model=None)
    522 
    523         self.objective = xgb_options["objective"]

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, learning_rates)
    202                            evals=evals,
    203                            obj=obj, feval=feval,
--> 204                            xgb_model=xgb_model, callbacks=callbacks)
    205 
    206 

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     82         # check evaluation result.
     83         if len(evals) != 0:
---> 84             bst_eval_set = bst.eval_set(evals, i, feval)
     85             if isinstance(bst_eval_set, STRING_TYPES):
     86                 msg = bst_eval_set

C:\ProgramData\Anaconda3\lib\site-packages\xgboost\core.py in eval_set(self, evals, iteration, feval)
    957         if feval is not None:
    958             for dmat, evname in evals:
--> 959                 feval_ret = feval(self.predict(dmat), dmat)
    960                 if isinstance(feval_ret, list):
    961                     for name, val in feval_ret:

<ipython-input-383-dfb8d5181b18> in F1_eval(preds, labels)
     11 
     12 
---> 13         P = sum(labels == 1)
     14 
     15 

TypeError: 'bool' object is not iterable

I do not understand why the function is not working. I have followed the examples here: https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py

I would like to understand where I err.

Thank you Eran for your thoughts. Could you please provide an example on how the F1 score function should be used using the Wine dataset that comes with sklearn? Moreover, I would appreciate if you could explain to me what is wrong with my code, this would help me correct my thinking in this domain. — user8270077, Jul 30 '18 at 08:12
I've explained to you what is wrong with your code, as you can see in the answer provided. — Eran Moshe, Jul 30 '18 at 08:14

Eran Moshe · Accepted Answer · 2018-07-31T11:24:21.533

When doing sum(labels == 1), Python evaluates labels == 1 as a Boolean object, thus you get TypeError: 'bool' object is not iterable

The function sum expecting an iterable object, like a list. Here's an example of your error:

In[32]: sum(True)
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-32-6eb8f80b7f2e>", line 1, in <module>
    sum(True)
TypeError: 'bool' object is not iterable

If you want to use f1_score of scikit-learn you can implement the following wrapup:

from sklearn.metrics import f1_score
import numpy as np

def f1_eval(y_pred, dtrain):
    y_true = dtrain.get_label()
    err = 1-f1_score(y_true, np.round(y_pred))
    return 'f1_err', err

params of the wrap up are list (of predictions) and DMatrix, and it returns a string, float

# Setting your classifier
clf_wine = xgb.XGBClassifier(max_depth=6, learning_rate=0.1,silent=False, objective='binary:logistic', \
                      booster='gbtree', n_jobs=8, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, \
                      subsample=0.8, colsample_bytree=0.8, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)

# When you fit, add eval_metric=f1_eval
# Please don't forget to insert all the .fit arguments required
clf_wine.fit(eval_metric=f1_eval)

Here you can see an example of how to implement custom objective function and custom evaluation metric

Example containing the following code:

# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    # return a pair metric_name, result
    # since preds are margin(before logistic transformation, cutoff at 0)
    return 'error', float(sum(labels != (preds > 0.0))) / len(labels)

which specify that an evaluation function gets as arguments (predictions, dtrain) dtrain is of type DMatrix and returns a string, float which is the name of the metric and the error.

Adding working python code example

import numpy as np

def _F1_eval(preds, labels):
    t = np.arange(0, 1, 0.005)
    f = np.repeat(0, 200)
    results = np.vstack([t, f]).T
    # assuming labels only containing 0's and 1's
    n_pos_examples = sum(labels)
    if n_pos_examples == 0:
        raise ValueError("labels not containing positive examples")

    for i in range(200):
        pred_indexes = (preds >= results[i, 0])
        TP = sum(labels[pred_indexes])
        FP = len(labels[pred_indexes]) - TP
        precision = 0
        recall = TP / n_pos_examples

        if (FP + TP) > 0:
            precision = TP / (FP + TP)

        if (precision + recall > 0):
            F1 = 2 * precision * recall / (precision + recall)
        else:
            F1 = 0
        results[i, 1] = F1
    return (max(results[:, 1]))

if __name__ == '__main__':
    labels = np.random.binomial(1, 0.75, 100)
    preds = np.random.random_sample(100)
    print(_F1_eval(preds, labels))

And if you want to implement _F1_eval to work specifically for xgboost evaluation methods add this:

def F1_eval(preds, dtrain):
    res = _F1_eval(preds, dtrain.get_label())
    return 'f1_err', 1-res

@user8270077 I've fixed f1_eval as it didn't work. You can try it now, or implement F1 yourself. I'm curious, will you use sklearn F1? — Eran Moshe, Jul 30 '18 at 11:28
Could you clarify something Eran? F1 is dependent on the the threshold. In eavery classification problem there is a threshold that maximizes F1. My code aims to find this best F1 and returns the maximum F1 after examining the F1s corresponding to 200 different thresholds (from 0.005 to 1). By using sklearn f1_score do I get the best F1 for the given predictions with the optimized threshold or I get an F1 score corresponding to an arbitrary predefined threshold which might not be optimal for the particular problem? — user8270077, Jul 31 '18 at 06:22
Not 100% sure, but I think you don't check all thresholds. in this line `err = 1-f1_score(y_true, np.round(y_pred))` we use np.round and thus setting the threshold to 0.5. You might want to tweak that function a bit to try all the thresholds as you set in your custom function and thus get the results you want. — Eran Moshe, Jul 31 '18 at 06:28
This is what I expected to hear in fact. I would be more comfortable working through my own function --which by the way I have implemented successfuly in R and works. Would it be much to propose that you have a look at my function and see if you can fix it? I am a beginner in Python and your help would be greatly appreciated. I think it would be a worthwhile contribution to the community, as we often want to maximize F1 in Classification problems but not necessarily the F1 corresponding to the 0.5 threshold. — user8270077, Jul 31 '18 at 09:05
Hi Eran. I used the evaluation function but for some reason I get 4 evaluation metrics in the output (instead of the expected two) and I am a bit confused. Would you like to check at my related post here?: https://stackoverflow.com/questions/51626360/unexpected-behavior-from-xgboost-in-python-with-custom-evaluation-function — user8270077, Aug 01 '18 at 09:04
No need to make it 1-f1. Just use f1 and set maximize = True. — SadeepDarshana, May 30 '20 at 05:40

Custom Evaluation Function based on F1 for use in xgboost - Python API

1 Answers1

Adding working python code example

Linked