How to convert log probability into simple probability between 0 and 1 values using python

Question

I am using Gaussian mixture model for speaker identification. I use this code to predict the speaker for each voice clip.

for path in file_paths:   
    path = path.strip()   
    print (path)
    sr,audio = read(source + path)
    vector   = extract_features(audio,sr)
    #print(vector)
    log_likelihood = np.zeros(len(models))
    #print(len(log_likelihood))

    for i in range(len(models)):
        gmm1   = models[i]  #checking with each model one by one
        #print(gmm1)
        scores = np.array(gmm1.score(vector)) 
        #print(scores)
        #print(len(scores))
        log_likelihood[i] = scores.sum()
        print(log_likelihood)
        winner = np.argmax(log_likelihood)
        #print(winner)
    print ("\tdetected as - ", speakers[winner])

and it gives me the output like this:

[ 311.79769716    0.            0.            0.            0.        ]
[  311.79769716 -5692.56559902     0.             0.             0.        ]
[  311.79769716 -5692.56559902 -6170.21460788     0.             0.        ]
[  311.79769716 -5692.56559902 -6170.21460788 -6736.73192695     0.        ]
[  311.79769716 -5692.56559902 -6170.21460788 -6736.73192695 -6753.00196447]
    detected as -  bart

Here score function gives me the log probability for each speaker. Now i want to decide threshold value, for that i need these log probability value into simple probability value (between 0 to 1). How can i do that? I am using python software.

Although I can't think of a good reason you would need to convert log probabilities back. Log probabilities are easier to work with in general. — rlbond, Jan 26 '18 at 21:06

kmario23 · Answer 1 · 2020-04-26T05:45:17.110

21

You have to take exponent (np.exp()) of the log probabilities to get the actual probabilities back. It's because logarithm is the inverse of exponentiation: e^log(p) = p, where p are the probabilities.

Below is an example:

# some input array
In [9]: a
Out[9]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# converting to probabilities using "softmax"
In [10]: probs = np.exp(a) / (np.exp(a)).sum()

# sanity check
In [11]: probs.sum()
Out[11]: 1.0

# obtaining log probabilities
In [12]: log_probs = np.log(probs)

In [13]: log_probs
Out[13]: 
array([-8.45855173, -7.45855173, -6.45855173, -5.45855173, -4.45855173,
       -3.45855173, -2.45855173, -1.45855173, -0.45855173])

# In most cases, it won't sum to 1.0
In [14]: log_probs.sum()
Out[14]: -40.126965551706405

# get the probabilities back
In [15]: probabilities = np.exp(log_probs)

In [16]: probabilities.sum()   # check passed
Out[16]: 1.0

In [17]: probabilities
Out[17]: 
array([  2.12078996e-04,   5.76490482e-04,   1.56706360e-03,
         4.25972051e-03,   1.15791209e-02,   3.14753138e-02,
         8.55587737e-02,   2.32572860e-01,   6.32198578e-01])

edited Apr 26 '20 at 05:45

answered Jan 26 '18 at 16:48

kmario23

57,311
13
161
150

2

I also tried using np.exp() function, but it does not give me the accurate result. It gives me the output array with scientific value(including greater than 1). How is it possible? because probability is never greater than 1. – Sandeep Jan 27 '18 at 05:55
@Sandeep without knowing the contents of arrays, it's tricky to reproduce your setting. – kmario23 Jan 27 '18 at 13:56
I mentioned my array contents (output) in my question. I mentioned my 5*5 array output in my question. Please look at that output and suggest me how can i convert these array values between 0 and 1. I want to decide threshold value, that's why i need values between 0 and 1. – Sandeep Jan 27 '18 at 17:00
Works great! @Sandeep you must be reading the output incorrectly. Numpy prints in scientific notation. Maybe try ```np.exp().tolist()``` for python list – Kurtis Streutker Apr 16 '20 at 13:58

Timoth Dev A · Answer 2 · 2020-05-16T21:41:43.387

The GMM module's score_sample from sklearn gives the probability density and they won't sum to 0, rather integrate to 1.

data = 10 * np.random.rand(100)
model = mixture.GMM(n_components=1).fit(data[:, None])
xfit = np.linspace(-5, 15, 5000)
logprob, _ = model.score_samples(xfit[:, None])
dx = xfit[1] - xfit[0]
print(dx * np.sum(np.exp(logprob)))
# 0.999773872653

You can also calculate the probability of a data point belonging to a multivariate normal distribution.,

Source: https://github.com/scikit-learn/scikit-learn/issues/4202

How to convert log probability into simple probability between 0 and 1 values using python

2 Answers2

Linked