1

I'm just started with pytorch and trying to understand how to deal with custom loss functions, especially with some non trivial ones.

Problem 1. I'd like to stimulate my nn to maximize true positive rate and at the same time minimize false discovery rate. For example increase total score on +2 for true positive, and decrease on -5 for false positive.

def tp_fp_loss(yhat, y):
    total_score = 0
    for i in range(y.size()):
        if is_tp(yhat[i],y[i]):
            total_score += 2
        if is_fp(yhat[i],y[i]):
            total_score -= 5
    return -total_score

Problem 2. In case when y is a list of positive and negative rewards (y = [10,-5, -40, 23, 11, -7]), stimulate nn to maximize sum of rewards.

def max_reward_loss(yhat,y):
    r = torch.autograd.Variable(torch.Tensor(y[yhat >= .5]), requires_grad=True).sum()                    
    return -r

Maybe I'm not completely understand some autograd mechanics, functions which I implemented correctly calculate loss but learning with them doesnt work :( What I'm doing wrong? Can anybody help me with some working solution of any of that problems?

Shai
  • 111,146
  • 38
  • 238
  • 371
37buEr
  • 21
  • 4

2 Answers2

3

Your loss function is not differentiable - you cannot compute its gradient (go ahead and try).
You should look as something like infogain loss

Shai
  • 111,146
  • 38
  • 238
  • 371
3

@Shai already summed it up: Your loss function is not differentiable.

One way to think about it is that your loss function should be plottable, and the "downhill" slope should "roll" toward the desired model output. In order to plot your loss function, fix y_true=1 then plot [loss(y_pred) for y_pred in np.linspace(0, 1, 101)] where loss is your loss function, and make sure your plotted loss function has the slope as desired. In your case, it sounds like you want to weight the the loss more strongly when it is on the wrong side of the threshold. As long as you can plot it, and the slope is always downhill toward your target value (no flat spots or uphill slopes on the way from a valid prediction to the target value), your model should learn from it.

Also note that if you're just trying to take into account some business objective which prioritizes precision over recall, you could accomplish this by training to convergence with cross entropy or some well-known loss function, and then by tuning your model threshold based on your use case. A higher threshold would normally prioritize precision, and a lower threshold would normally prioritize recall. After you've trained, you can then evaluate your model at a variety of thresholds and choose the most appropriate.

colllin
  • 9,442
  • 9
  • 49
  • 65