What would be a good loss function to penalize the magnitude and sign difference

Question

I'm in a situation where I need to train a model to predict a scalar value, and it's important to have the predicted value be in the same direction as the true value, while the squared error being minimum.

What would be a good choice of loss function for that?

For example:

Let's say the predicted value is -1 and the true value is 1. The loss between the two should be a lot greater than the loss between 3 and 1, even though the squared error of (3, 1) and (-1, 1) is equal.

Thanks a lot!

You can create any custom logic. How much difference do you expect in loss of (3,1) and (-1,1)? — Akash Goyal, Jun 06 '18 at 05:40
@AkashGoyal I'm unsure of how much difference between the two can be, but I'm expecting to have the penalty of (-1, 1) higher than (3, 1). Right now my way of doing it is to add a constant number to the end of the regular squared error if there's a difference with the sign, else adding 0. Thank you for your comments. — Ethan Chen, Jun 06 '18 at 09:06

score 18 · Accepted Answer · answered Jun 06 '18 at 09:50

This turned out to be a really interesting question - thanks for asking it! First, remember that you want your loss functions to be defined entirely of differential operations, so that you can back-propagation though it. This means that any old arbitrary logic won't necessarily do. To restate your problem: you want to find a differentiable function of two variables that increases sharply when the two variables take on values of different signs, and more slowly when they share the same sign. Additionally, you want some control over how sharply these values increase, relative to one another. Thus, we want something with two configurable constants. I started constructing a function that met these needs, but then remembered one you can find in any high school geometry text book: the elliptic paraboloid!

The standard formulation doesn't meet the requirement of sign agreement symmetry, so I had to introduce a rotation. The plot above is the result. Note that it increases more sharply when the signs don't agree, and less sharply when they do, and that the input constants controlling this behaviour are configurable. The code below is all that was needed to define and plot the loss function. I don't think I've ever used a geometric form as a loss function before - really neat.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm


def elliptic_paraboloid_loss(x, y, c_diff_sign, c_same_sign):

    # Compute a rotated elliptic parabaloid.
    t = np.pi / 4

    x_rot = (x * np.cos(t)) + (y * np.sin(t))

    y_rot = (x * -np.sin(t)) + (y * np.cos(t))

    z = ((x_rot**2) / c_diff_sign) + ((y_rot**2) / c_same_sign)

    return(z)


c_diff_sign = 4

c_same_sign = 2

a = np.arange(-5, 5, 0.1)

b = np.arange(-5, 5, 0.1)

loss_map = np.zeros((len(a), len(b)))

for i, a_i in enumerate(a):

    for j, b_j in enumerate(b):

        loss_map[i, j] = elliptic_paraboloid_loss(a_i, b_j, c_diff_sign, c_same_sign)



fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(a, b)
surf = ax.plot_surface(X, Y, loss_map, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False)

plt.show()

Thank you so much for the informative elaboration! I haven't implemented this loss function into my training model yet but it seems promising. More importantly, your explanation has successfully expanded my understanding of how loss functions work and how they can be synthesized. Really appreciate it. I will come back and tell you how it performs in the model after the training. — Ethan Chen, Jun 07 '18 at 10:15
Hi @JustinFletcher. I assume the "x" and "y" in the above sample code represent the truth and prediction respectively and found out that the loss function gives (-5, -5) a penalty of 50 when it should be 0. Then I realized that the loss function has an optimal point only at (0, 0) while we are expecting any points on y = x should be optimal. Or maybe I misinterpreted the "x" and "y"? — Ethan Chen, Jun 11 '18 at 03:42
Hey Ethan. You’re right, this description wasn’t complete. The function elliptic_paraboloid_loss is a scaling factor that could be applied to any loss function of your choice. It could be used to scale any loss function. In your case, the standard regression loss, squared error, would be appropriate. To see that this is so, note that when the values are (-5, -5) the SE is 0. 0 times the elliptic paraboloid loss (50) is still 0. Sorry for not being clear about that. — Justin Fletcher, Jun 11 '18 at 04:07
is there a way to make this type of scaling appropriate for convex optimization? The above function is convex and you cannot multiply a convex function by another convex function (squared loss / absolute loss) as a loss function — Michael, Dec 06 '18 at 16:13
@Michael Were you able to optimize it further? I am very interested in this approach as I am facing similar criteria for a loss function. — Coderji, Jan 08 '19 at 11:02
This looks like a very interesting approach, but how would you calculate the derivative of loss with regards to error given that x and y are constituents of it? — Ludecan, Jun 03 '19 at 22:10

score 1 · Answer 2 · answered Feb 01 '21 at 08:57

From what I understand, your current loss function is something like:

loss = mean_square_error(y, y_pred)

What you could do, is to add one other component to your loss, being this a component that penalizes negative numbers and does nothing with positive numbers. And you can choose a coefficient for how much you want to penalize it. For that, we can use like a negative shaped ReLU. Something like this:

Let's call "Neg_ReLU" to this component. Then, your loss function will be:

loss = mean_squared_error(y, y_pred) + Neg_ReLU(y_pred)

So for example, if your result is -1, then the total error would be:

mean_squared_error(1, -1) + 1

And if your result is 3, then the total error would be:

mean_squared_error(1, -1) + 0

(See in the above function how Neg_ReLU(3) = 0, and Neg_ReLU(-1) = 1.

If you want to penalize more the negative values, then you can add a coefficient:

coeff_negative_value = 2

loss = mean_squared_error(y, y_pred) + coeff_negative_value * Neg_ReLU

Now the negative values are more penalized.

The ReLU negative function we can build it like this:

tf.nn.relu(tf.math.negative(value))

So summarizing, in the end your total loss will be:

coeff = 1

Neg_ReLU = tf.nn.relu(tf.math.negative(y))

total_loss = mean_squared_error(y, y_pred) + coeff * Neg_ReLU

What would be a good loss function to penalize the magnitude and sign difference

2 Answers2