I created a unigram language model for a sentence completion implementation. I have all the words with their occurences number.
I'm confused on how to compare them from here. I would think that I have to calculate the probability of each case and take the biggest one.
So if I have 3 words that can be used, I compare the number of occurences of each word and take the highest ? Is this the proper implementation ?
Or I divide the number of occurences of each word with the number of all (distinct?) words of the training set ?
Thank you.