2

I searched the forum and found this thread, but it does not cover my question Two ways around -inf

From a Machine Learning class, week 3, I am getting -inf when using log(0), which later turns into an NaN. The NaN results in no answer being given in a sum formula, so no scalar for J (a cost function which is the result of matrix math).

Here is a test of my function

>> sigmoid([-100;0;100])
ans =
3.7201e-44
5.0000e-01
1.0000e+00

This is as expected. but the hypothesis requires ans = 1-sigmoid

>> 1-ans
ans =
1.00000
0.50000
0.00000

and the Log(0) gives -Inf

>> log(ans)
ans =
0.00000
-0.69315
-Inf

-Inf rows do not add to the cost function, but the -Inf carries through to NaN, and I do not get a result. I cannot find any material on -Inf, but am thinking there is a problem with my sigmoid function.

Can you provide any direction?

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Edward h
  • 21
  • 1
  • 5
  • The -inf rows probably should be contributing to the cost function. Technically speaking, sigmoid (100) is just slightly smaller than 1, but the difference is probably too small for the precision of the floating point representation. (1 - ans) is therefore a very, very small positive number. And therefore, log(1 - ans) is a very, very negative number. This certainly should be impacting the cost function. – Martin Cook Aug 17 '18 at 12:46

3 Answers3

2

The typical way to avoid infinity in these cases is to add eps to the operand:

log(ans + eps)

eps is a very, very small value, and won't affect the output for values of ans unless ans is zero:

>> z = [-100;0;100];
>> g = 1 ./ (1+exp(-z));
>> log(1-g + eps)
ans =
    0.0000
   -0.6931
  -36.0437
Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • My sigmoid function is g = 1./(1+exp(-z)); This is very similar to spoonless's g = 1 ./ (1 + e.^-z); – Edward h Aug 18 '18 at 00:44
  • My sigmoid function is g = 1./(1+exp(-z)); This is very similar to spoonless's g = 1 ./ (1 + e.^-z); Is there something different about e.^-z and exp(-z) that returns a value with differing precision? Is there a setting in Octave or elsewhere that is impacting this? Thanks for the suggestions. (Sorry for the multiple posts - it is due to fat fingers) – Edward h Aug 18 '18 at 00:52
  • @Edwardh: you can delete your own comments by clicking on the "x" button that appears at the end of it if you hover your mouse over the comment. – Cris Luengo Aug 18 '18 at 03:30
  • @Edwardh: MATLAB doesn't know `e`, it's probably an Octave extension. However, `exp(-z)` and `exp(1).^-z` yield nearly identical values (`3.7201e-44`), which differ by ~`4e-58`. In any case, `1-g` is identical to `0` in both cases. – Cris Luengo Aug 18 '18 at 03:33
1

Adding to the answers here, I really do hope you would provide some more context to your question (in particular, what are you actually trying to do.

I will go out on a limb and guess the context, just in case this is useful. You are probably doing machine learning, and trying to define a cost function based on the negative log likelihood of a model, and then trying to differentiate it to find the point where this cost is at its minimum.

In general for a reasonable model with a useful likelihood that adheres to Cromwell's rule, you shouldn't have these problems, but, in practice it happens. And presumably in the process of trying to calculate a negative log likelihood of a zero probability you get inf, and trying to calculate a differential between two points produces inf / inf = nan.

In this case, this is an 'edge case', and generally in computer science edge cases need to be spotted as exceptional circumstances and dealt with appropriately. The reality is that you can reasonably expect that inf isn't going to be your function's minimum! Therefore, whether you remove it from the calculations, or replace it by a very large number (whether arbitrarily or via machine precision) doesn't really make a difference.

So in practice you can do either of the two things suggested by others here, or even just detect such instances and skip them from the calculation. The practical result should be the same.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
0

-inf means negative infinity. Which is the correct answer because log of (0) is minus infinity by definition.

The easiest thing to do is to check your intermediate results and if the number is below some threshold (like 1e-12) then just set it to that threshold. The answers won't be perfect but they will still be pretty close.

Using the following as the sigmoid function:

function g = sigmoid(z)
g = 1 ./ (1 + e.^-z);
end

Then the following code runs with no issues. Choose the threshold value in the 'max' statement to be less than the expected noise in your measurements and then you're good to go

>> a = sigmoid([-100, 0, 100])
a =

   3.7201e-44   5.0000e-01   1.0000e+00

>> b = 1-a
b =

   1.00000   0.50000   0.00000

>> c = max(b, 1e-12)
c =

   1.0000e+00   5.0000e-01   1.0000e-12

>> d = log(c)
d =

    0.00000   -0.69315  -27.63102
Spoonless
  • 561
  • 5
  • 14