22

I want to make prediction in a data science project, and the error is calculate through an asymmetric function.

Is it possible to tune the loss function of random forest or gradient boosting (of sklearn) ?

I have read that it is required to modify a .pyx file but I cannot find any in my sklearn folder (I am on ubuntu 14.04 LTS).

Do you have suggestions ?

M.LTA
  • 329
  • 1
  • 2
  • 4

2 Answers2

7

Yes, it is possible to tune. For example:

class ExponentialPairwiseLoss(object):
    def __init__(self, groups):
        self.groups = groups

    def __call__(self, preds, dtrain):
        labels = dtrain.get_label().astype(np.int)
        rk = len(np.bincount(labels))
        plus_exp = np.exp(preds)
        minus_exp = np.exp(-preds)
        grad = np.zeros(preds.shape)
        hess = np.zeros(preds.shape)
        pos = 0
        for size in self.groups:
            sum_plus_exp = np.zeros((rk,))
            sum_minus_exp = np.zeros((rk,))
            for i in range(pos, pos + size, 1):
                sum_plus_exp[labels[i]] += plus_exp[i]
                sum_minus_exp[labels[i]] += minus_exp[i]
            for i in range(pos, pos + size, 1):
                grad[i] = -minus_exp[i] * np.sum(sum_plus_exp[:labels[i]]) +\
                          plus_exp[i] * np.sum(sum_minus_exp[labels[i] + 1:])
                hess[i] = minus_exp[i] * np.sum(sum_plus_exp[:labels[i]]) +\
                          plus_exp[i] * np.sum(sum_minus_exp[labels[i] + 1:])
            pos += size
        return grad, hess
ayhan
  • 70,170
  • 20
  • 182
  • 203
Mark
  • 79
  • 1
  • 2
-2

You don't need to change anything from any file.

Modifying a .py file is generally a bad idea and one should avoid doing so.

If you want to create your own scoring function, here is a link to sklearn's documentation that shows how to do it.

MMF
  • 5,750
  • 3
  • 16
  • 20
  • 23
    Your link is for scoring, not for objective functions used for training. – Mikhail Korobov Nov 21 '16 at 18:48
  • 4
    Wrong. The "scoring function" linked to can be optimized during training. – Alex Nov 21 '16 at 23:56
  • 3
    Thanks MMF, but I understood like Mikhail. I understood that your link show to build a scorer used by the k-fold cross validation performed by scikitlearn. The scorer in your link is not used in the growing tree procedure. – M.LTA Nov 25 '16 at 12:48
  • 4
    Alex Miller shows how to define a custom objective function for linear regression (which simply computes the error based on an arbitrary loss function). https://alex.miller.im/posts/linear-model-custom-loss-function-regularization-python/ – cw' Jan 19 '19 at 08:27
  • Objective functions determine how the coefficients are updated/optimized. Scoring functions choose which set of optimized coefficients is "better" relative to other sets of optimized coefficients. I'm not sure if customizing only one of them can practically achieve the similar set of optimized coefficients. – Nuclear03020704 Oct 15 '21 at 16:38