4

I'm trying to calculate the entropy of English using the following Java function

public static void calculateEntropy()
    {
        for(int i = 0; i < letterFrequencies[i]; i++)
        {
            entropy += letterFrequencies[i] * (Math.log(letterFrequencies[i])/Math.log(2));
        }
        entropy *= -1;
    }

The formula I'm using requires log base 2 but Java only has natural log and log base 10. I'm trying to use the change of base formula to get the log base 2 of letterFrequencies[i]. I do not know if I am implementing it correctly because I'm expecting an answer close to 4.18 but instead getting roughly .028

Nick Gilbert
  • 4,159
  • 8
  • 43
  • 90

2 Answers2

2

The problem is in the for's stop condition:

i < letterFrequencies[i] should be i < letterFrequencies.length.

Furthermore, I'd use Guava's DoubleMath.log2() method, which is optimized as @LutzL suggested.

fps
  • 33,623
  • 8
  • 55
  • 110
-3

Mathematically that implementation is correct, but it doesn't work out in code. You could instead write your own implementation, witch runs significantly faster:

public static int log2(int n){
    if(n <= 0) throw new IllegalArgumentException();
    return 31 - Integer.numberOfLeadingZeros(n);
}

Source: How do you calculate log base 2 in Java for integers?

Community
  • 1
  • 1
flaghacker
  • 23
  • 1
  • 10