8

Given :

enter image description here

and the following :

enter image description here

For :

q(runs | the, dog) = 0.5

Should this not be 1 as for q(runs | the, dog) : xi=runs , xi-2=the , xi-1=dog

Probability is (wi has been swapped for xi):

enter image description here

therefore :

count(the dog runs) / count(the dog) = 1 / 1 = 1

But in above example the value is 0.5 . How is 0.5 arrived at ?

Based on http://files.asimihsan.com/courses/nlp-coursera-2013/notes/nlp.html#markov-processes-part-1

blue-sky
  • 51,962
  • 152
  • 427
  • 752
  • To answer your first question: No it shouldn't. The P (runs| the, dog) + P(STOP| the, dog) should sum to 1 given that both have the same context "the dog". To answer your second question, based on which training data you are computing count(the dog runs)? – user3639557 Feb 22 '16 at 04:57
  • @user3639557 "To answer your first question: No it shouldn't. The P (runs| the, dog) + P(STOP| the, dog) should sum to 1 given that both have the same context "the dog"" not sure what question your answering here as I asked how The P (runs| the, dog) = 0.5 when I arrive at value 1 ? The training data is V={the,dog,runs} U {STOP} U {*} – blue-sky Feb 22 '16 at 08:00
  • 1
    V={the,dog,runs, STOP, *} is not the training data. it's the vocabulary set. You haven't provided the training data. – user3639557 Feb 22 '16 at 13:03
  • @user3639557 now I understand my error thanks to your last comment. I don't have the training set. To get 0.5 as I originally asked the training set is approx : x1 = {the , dog, runs} , x2 = {the . dog , walks} – blue-sky Feb 22 '16 at 13:27
  • 1
    If you are going to work on these things, it's better to watch Michael Collins lectures on coursera. He covers ngram language modelling in good depth (along with some other nlp topics) and it's easy to follow him too: https://www.coursera.org/course/nlangp – user3639557 Feb 22 '16 at 14:13

1 Answers1

1

The number 0.5 was not "arrived at" at all; the author just took an arbitrary number for the purpose of illustration.

Any n-gram language model consists of two things: vocabulary and transition probabilities. And the model "does not care" how these probabilities were derived. The only requirement is that the probabilities are self-consistent (that is, for any prefix, the probabilities of all possible continuations sum up to 1). For the model above, it is true: e.g. p(runs|the, dog) + p(STOP|the,dog)=1.

Of course, in practical applications, we are indeed interested how to "learn" the model parameters from some text corpus. You can calculate that your particular language model can generate the following texts:

the           # with 0.5  probability
the dog       # with 0.25 probability
the dog runs  # with 0.25 probability

From this observation, we can "reverse-engineer" the training corpus: it might have consisted of 4 sentences:

the
the
the dog
the dog runs

If you count all the trigrams in this corpus and normalize the counts, you see that the resulting relative frequencies are equal to the probabilities from your screenshot. In particular, there is 1 sentence which ends after "the dog", and 1 sentence in which "the dog" is followed by "runs". That's how the probability 0.5 (=1/(1+1)) could have emerged.

David Dale
  • 10,958
  • 44
  • 73