0

what is the difference between arithmetic and geometric normalized mutual information, i have :

    In [4]: real
    Out[4]:
    array([0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

    In [6]: test
    Out[6]:
    array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

now i want to calculate normalized mutual information: but it acting kind of wierd

    In [13]: normalized_mutual_info_score(real.astype(int),test.astype(int),average_method='arithmetic')
    Out[13]: 6.422893887289432e-16

    In [14]: normalized_mutual_info_score(real.astype(int),test.astype(int),average_method='geometric')
    Out[14]: 1.0

the main question in WHY?

1 Answers1

0

it's about how to calculate the denominator of the formulation. For normalized mutual information and adjusted mutual information, the normalizing value is typically some generalized mean of the entropies of each clustering. Various generalized means exist, and no firm rules exist for preferring one over the others. The decision is largely a field-by-field basis; for instance, in community detection, the arithmetic mean is most common. Each normalizing method provides “qualitatively similar behaviors”. In our implementation, this is controlled by the average_method parameter. this is the scikit learn information