5

I have an Nx1 array that corresponds to a probability distribution, i.e. the sum of the elements sums to 1. This is represented as a regular numpy array. Since N might be relatively large, e.g. 10 or 20, many of the individual elements are pretty close to 0. I find that when I take log(my_array), I get the error "FloatingPointError: invalid value encountered in log". Note that this is after setting seterr(invalid='raise') in numpy intentionally.

How can I deal with this numerical issue? I'd like to represent vectors corresponding to a probability distribution and their take log without rounding to 0, since then I end up taking log(0) which raises the error.

thanks.

  • Probability of zero is a special case, why would you consider it to be the same as non-zero probabilities? Why not simply filter it out of the data and work with the non-zero only? – S.Lott Nov 17 '10 at 16:28
  • 1
    Have you double checked that all the values in the distribution are really positive? No negative values and no values that are exactly zero? Really small values should not matter. – Sven Marnach Nov 18 '10 at 15:17
  • Same issue as: http://stackoverflow.com/questions/3704570/in-python-small-floats-tending-to-zero – monkut May 11 '12 at 04:22

4 Answers4

3

You can just drop the tails according to the accuracy you need.

eps = 1e-50
array[array<eps]=eps
log(array)
gerry
  • 1,539
  • 1
  • 12
  • 22
2

What's pretty close to zero ?

>>> np.log(0)
-inf
>>> 0.*np.log(0)
nan
>>> np.log(1e-200)
-460.51701859880916
>>> 1e-200*np.log(1e-200)
-4.6051701859880914e-198

One solution is to add a small positive number to all probabilities to restrict them to be far enough away from zero.

The second solution is to handle zeros explicitly, for example replace 0.*np.log(0) with zeros in the resulting array, or only include points that have nonzero probability in the probability array

Josef
  • 21,998
  • 3
  • 54
  • 67
1

How 'pretty close' to 0 are they? Python seems happy taking log of 10^-very large:

>>> log(0.0000000000000000000000000001)
-64.472382603833282

Also, why are you taking logs? What do you plan to do with them once you've took them?

Spacedman
  • 92,590
  • 12
  • 140
  • 224
0

Depending on what you're doing afterwards, you could use a different transform that doesn't explode on zero values like log does. Perhaps a sigmoid function or something else with a well-defined Jacobian.

If you're just looking to visualize the data, you could always add some tiny value before you take the log.

Mark
  • 31
  • 5