I have a task where I have a list of certain values: l = ["alpha", "beta", "beta", "alpha", "gamma", "alpha", "alpha"]
. I have a formula for computing a kind of probability on this list as the following (the probability is high in case there is many different values in the list and low if there are few kind of values):
$ p = - \sum_{i=1}^m f_i log_m f_i $
where m
is the length of the list, $f_i$ is the frequency of the ith element of the list.
I want to code this in Python with the following:
from math import log
from collections import Counter
-sum([loc*log(loc, len(set(l))) for loc in Counter(l).values()])
But I somehow suspect that this is not the right way. Any better idea? Additionally: I do not understand the negative sign in the formula, what is the explanation of this?