6

I use the following code to return Shannon's Entropy on an array that represents a probability distribution.

A = np.random.randint(10, size=10)

pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(pA))

This works fine if the array doesn't contain any zero's.

Example:

Input: [2 3 3 3 2 1 5 3 3 4]
Output: 3.2240472715

However, if the array does contain zero's, Shannon's Entropy produces nan

Example:

Input:[7 6 6 8 8 2 8 3 0 7]
Output: nan

I do get two RuntimeWarnings:

1) RuntimeWarning: divide by zero encountered in log2

2) RuntimeWarning: invalid value encountered in multiply

Is there a way to alter the code to include zero's? I'm just not sure if removing them completely will influence the result. Specifically, if the variation would be greater due to the greater frequency in distribution.

  • Removing the zeros in the later part of the calculation does not amount to ignoring the zeroes. The influence of the zero is from the `pA = A / A.sum()`. The result of `A.sum()` is smaller due to the zeroes being present. – fasta Apr 23 '18 at 05:17

2 Answers2

7

I think you want to use nansum to count nans as zero:

A = np.random.randint(10, size=10)
pA = A / A.sum()
Shannon2 = -np.nansum(pA*np.log2(pA))
basaundi
  • 1,725
  • 1
  • 13
  • 20
  • For python 2.7 this code also needs: `from __future__ import division` to force non-integer division. See: https://stackoverflow.com/questions/1267869/how-can-i-force-division-to-be-floating-point-division-keeps-rounding-down-to-0 – fasta Apr 23 '18 at 05:15
0

The easiest and most used way is to ignore the zero probabilities and calculate the Shannon's Entropy on remaining values.

Try the following:

import numpy as np
A = np.array([1.0, 2.0, 0.0, 5.0, 0.0, 9.0])
A = np.array(filter(lambda x: x!= 0, A))
pA = A / A.sum()
Shannon2 = -np.sum(pA * np.log2(pA))
Arpit Kathuria
  • 414
  • 4
  • 13