taking log of very small values using numpy/scipy in Python

Question

I have an Nx1 array that corresponds to a probability distribution, i.e. the sum of the elements sums to 1. This is represented as a regular numpy array. Since N might be relatively large, e.g. 10 or 20, many of the individual elements are pretty close to 0. I find that when I take log(my_array), I get the error "FloatingPointError: invalid value encountered in log". Note that this is after setting seterr(invalid='raise') in numpy intentionally.

How can I deal with this numerical issue? I'd like to represent vectors corresponding to a probability distribution and their take log without rounding to 0, since then I end up taking log(0) which raises the error.

thanks.

Probability of zero is a special case, why would you consider it to be the same as non-zero probabilities? Why not simply filter it out of the data and work with the non-zero only? — S.Lott, Nov 17 '10 at 16:28
Have you double checked that all the values in the distribution are really positive? No negative values and no values that are exactly zero? Really small values should not matter. — Sven Marnach, Nov 18 '10 at 15:17
Same issue as: http://stackoverflow.com/questions/3704570/in-python-small-floats-tending-to-zero — monkut, May 11 '12 at 04:22

score 3 · Answer 1 · answered Nov 17 '10 at 16:37

3

You can just drop the tails according to the accuracy you need.

eps = 1e-50
array[array<eps]=eps
log(array)

answered Nov 17 '10 at 16:37

gerry

1,539
1
12
22

score 2 · Answer 2 · answered Nov 17 '10 at 16:32

What's pretty close to zero ?

>>> np.log(0)
-inf
>>> 0.*np.log(0)
nan
>>> np.log(1e-200)
-460.51701859880916
>>> 1e-200*np.log(1e-200)
-4.6051701859880914e-198

One solution is to add a small positive number to all probabilities to restrict them to be far enough away from zero.

The second solution is to handle zeros explicitly, for example replace 0.*np.log(0) with zeros in the resulting array, or only include points that have nonzero probability in the probability array

score 1 · Answer 3 · answered Nov 17 '10 at 16:23

1

How 'pretty close' to 0 are they? Python seems happy taking log of 10^-very large:

>>> log(0.0000000000000000000000000001)
-64.472382603833282

Also, why are you taking logs? What do you plan to do with them once you've took them?

answered Nov 17 '10 at 16:23

Spacedman

92,590
12
140
224

score 0 · Answer 4 · answered Nov 17 '10 at 17:25

Depending on what you're doing afterwards, you could use a different transform that doesn't explode on zero values like log does. Perhaps a sigmoid function or something else with a well-defined Jacobian.

If you're just looking to visualize the data, you could always add some tiny value before you take the log.

taking log of very small values using numpy/scipy in Python

4 Answers4