Understanding hinge loss in scikit learn

Question

I have seen this hinge loss chart:

https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss

And also here:

https://programmathically.com/understanding-hinge-loss-and-the-svm-cost-function/

However, creating the "same" graph using scikit-learn, is quite similar but seems the "opposite". Code is as follows:

from sklearn.metrics import hinge_loss
import matplotlib.pyplot as plt
import numpy as np


predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

hinge loss for y_true == 1

And some specific points in the chart above:

hinge_loss([1], [-5]) = 0.0,
hinge_loss([1], [-1]) = 0.0,
hinge_loss([1], [0])  = 1.0,
hinge_loss([1], [1])  = 2.0,
hinge_loss([1], [5])  = 6.0

predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

hinge loss for y_true == -1

And some specific points in the chart above:

hinge_loss([-1], [-5]) = 0.0,
hinge_loss([-1], [-1]) = 0.0,
hinge_loss([-1], [0])  = 1.0,
hinge_loss([-1], [1])  = 2.0,
hinge_loss([-1], [5])  = 6.0

Can someone explain me why hinge_loss() in scikit-learn seems like the opposite from the other two first charts?

Many thanks in advance

EDIT: Based on the answer, I can reproduce the same output without flipping the values. This is based on the following: As hinge_loss([0], [-1])==0 and hinge_loss([-2], [-1])==0. Based on this, I can call hinge_loss() with an array of two values without altering the calculated loss.

Following code does not flip the values:

predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y, 0]), np.array([p, -1])) * 2
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y,-2]), np.array([p,-1])) * 2
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

The question now is why for each corresponding case, those "combinations" of values work well.

amiola · Accepted Answer · 2022-11-24T19:37:01.157

Having a look at the code underlying the hinge_loss implementation, the following is what happens in the binary case:

lbin = LabelBinarizer(neg_label=-1)
y_true = lbin.fit_transform(y_true)[:, 0]

try:
    margin = y_true * pred_decision
except TypeError:
    raise TypeError("pred_decision should be an array of floats.")

losses = 1 - margin
np.clip(losses, 0, None, out=losses)
np.average(losses, weights=sample_weight)

Due to the fact that LabelBinarizer.fit_transform() behaviour in case of single label defaults to returning an array of negative labels

from sklearn.preprocessing import LabelBinarizer

lbin = LabelBinarizer(neg_label=-1)
lbin.fit_transform([1, 1, 1, 1, 1, 1, 1])   # returns array([[-1],[-1],[-1][-1],[-1],[-1],[-1]])

this implies that the (unique) label sign gets flipped, which explains the plot you obtain.

Despite the example with single label being quite weird, there has been ofc some debate on such issue, see https://github.com/scikit-learn/scikit-learn/issues/6723 eg. Digging into the github issues, it seems that they have not reached a final decision yet on potential fixes to be applied.

Answer to the EDIT:

Imo the way you're enriching y and p in the loop does work because - that way - you're effectively "escaping" from the single label case (in particular, what really matters is the way you're dealing with y). Indeed,

lbin = LabelBinarizer(neg_label=-1)
lbin.fit_transform(y_true)[:, 0]

with y_true=np.array([y,0])=np.array([1,0]) (first case) or with y_true=np.array([y,-2])=np.array([-1,-2]) (second case) respectively return array([ 1, -1]) and array([ 1, -1]). On the other hand, the other possibility for your second case you're mentioning in the comment, namely y_true=np.array([y,-1])=np.array([-1,-1]) does not let you escape from single label case (lbin.fit_transform(np.array([-1, -1]))[:, 0] returns array([-1, -1]); thus you fall back into the "bug" described above).

Hi, many thanks. Could you review my "edit question"? . Many thanks — Alberto, Nov 22 '22 at 18:34
I'm not sure I correctly understand the edit part. In particular, I'm not getting why you're multiplying the loss by 2; plus, I'd use {-1, 1} or {0, 1} as labels (I'm not getting why you're enriching `y` and `p` the way you're doing; perhaps you're swapping (0,-1) and (-2,-1) by mistake?). Nevertheless, it seems to me that once you escape from the single label case as you're doing, things do work well. — amiola, Nov 23 '22 at 10:11
Multiplying the loss by 2: Last line in source code of hinge_loss() - https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b611bf873bd5836748647221480071a87/sklearn/metrics/_classification.py#L2611 – is averaging the loss. In the example provided, as only one component of the “enriched” array has actual loss, I have to multiply it by 2 to get the real loss. — Alberto, Nov 24 '22 at 14:48
Enriching y and p: trying to “escape” from the single label case in order to hinge_loss() not get flipped sings. hinge_loss([-1], [0]) == 1, hinge_loss([0], [-1]) == 0. This is why, in the first case, I am enriching in this way. In the second case, despite the fact that both hinge_loss([-1], [-2]) == 0, and hinge_loss([-2], [-1]) == 0, I have to enrich it with the second case. If not, the obtained graph is flipped. — Alberto, Nov 24 '22 at 14:49
Understood, I've tried to add a plausible answer. My point on labels was just that I'd have had dealt with labels {1,-1} only because you were distinguishing these two examples. Therefore enriching `y` with 0 in the first case was not really meaningful imo; however, I was missing the point you're raising for the second example. — amiola, Nov 24 '22 at 19:40

Understanding hinge loss in scikit learn

1 Answers1