0

Using some mathematical tricks and MATLAB, we can easily calculate the entropy of given input. For instance

x = [10 25 4 10 9 4 4];
[a,b]=hist(x,unique(x));

x =

    10    25     4    10     9     4     4

a =

     3     1     2     1

b =

     4     9    10    25

My question is the following: Because we are using log function, is it advised to add a small constant within the logarithm function to ensure proper calculations? For instance, should we use +eps? As an example:

probbailities=a./numel(x);

probbailities =

    0.4286    0.1429    0.2857    0.1429

-sum(probbailities .*log2(probbailities));

ans =

    1.8424

-sum(probbailities .*log2(probbailities+eps));

ans =

    1.8424
rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • sorry instead of sum,i should divide it by length,but it does not matter in this case –  Jun 06 '14 at 18:03
  • 2
    Since in your edit you remove your question. There nothing here to answer. But `log2(0) = -Inf`. So yes, to avoid trouble epsilon can be a good idea. Because with numerical imprecision, you will lost some information. But if you work with 4 digit, it enough with eps. – Vuwox Jun 06 '14 at 18:23
  • but is it adviced because we are using log function,use +eps inside logarithmic function? it is question sir –  Jun 06 '14 at 18:27
  • If you probability is zero, it sound like your element isn't suppose to be taken. So, you should just not sum on it. Epsilon avoid log2 infinity. But normaly, You shouldn't have prob = 0. – Vuwox Jun 06 '14 at 18:29
  • Take example of word count, If your word appear with 0 probability. This word isn't needed in your set. So don't need to compute a prob for it. So your entropy estimation shouldn't take this element. So log2 isn't suppose to have 0 in it. So no problem. And probability should sum to 1. And in case of 1 value, prob = 1 and log2(1) = 0. Entropy of 0 is good for one element, you have no restriction with other elements. – Vuwox Jun 06 '14 at 18:32
  • You can maybe read the [coin toss example](http://en.wikipedia.org/wiki/Entropy_(information_theory)) to understand that the probability is between 0 and 1. That show H(x) is a parabolla in that case. – Vuwox Jun 06 '14 at 18:35
  • @AlexandreBizeau have a nice night,thanks for your efford –  Jun 06 '14 at 18:37
  • When I have calculated joint image entropy, I usually skip over any probability bins that have 0. What I do is I find any locations in the PDF that have 0 and just skip those in the sum. Take a look here: http://stackoverflow.com/questions/23691398/mutual-information-of-two-images-matlab – rayryeng Jun 06 '14 at 22:19
  • but i can't see reason of downvoting –  Jun 07 '14 at 10:15
  • 1
    @datodatuashvili I didn't downvote. This is a legitimate question. However, your phrasing about whether or not to add a constant to ensure the proper domain was not very clear in the question. I have modified your phrasing to reflect this. – rayryeng Jun 08 '14 at 02:20

0 Answers0