4

How to determine the probability of a sentence "what is a cat" ? with associated PCFG :

Rule , Probability
S -> NP VB
NN -> CAT , 1
DT -> what , 1
VB->is , .5
VB->be , .5

How can this pcfg with sentence be represented as hidden markov model ?

Each node in the model is "what" , "is" , "a" , "cat" ? , how to model the probability connections between the nodes from PCFG ?

thepen
  • 371
  • 1
  • 11
blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

4

A PCFG defines (i) a distribution over parse trees and (ii) a distribution over sentences.

The probability of a parse tree given by a PCFG is:

where the parse tree t is described as a multiset of rules r (it is a multiset because a rule can be used several times to derive a tree).

To compute the probability of a sentence, you have to consider every possible way of deriving the sentence and sum over their probabilities.

where means that the string of terminals (the yield) of t is the sentence s.

In your example, the probability of "what is a cat" is 0 because you cannot generate it with your grammar. Here's another example with a toy grammar:

Rule            Probability
S -> NP VP      1
NP -> they      0.5
NP -> fish      0.5
VP -> VBP NP    0.5
VP -> MD VB     0.5
VBP -> can      1
MD -> can       1
VB -> fish      1

The sentence "they can fish" has two possible parse trees:

(S (NP they) (VP (MD can) (VB fish)))
(S (NP they) (VP (VBP can) (NP fish)))

with probabilities:

S   NP    VP     MD   VB
1 * 0.5 * 0.5 *  1  * 1 = 1 / 4

and

S   NP    VP     VBP   NP
1 * 0.5 * 0.5 *  1  *  0.5  = 1 /8

so its probability is the sum or probabilities of both parse trees (3/8).

It turns out that in the general case, it is too computationally expensive to enumerate each possible parse tree for a sentence. However, there is a O(n^3) algorithm to compute the sum efficiently: the inside-outside algorithm (see pages 17-18), pdf by Michael Collins.

Edit: corrected trees.

mcoav
  • 1,586
  • 8
  • 11
  • the forward algorithm (https://en.wikipedia.org/wiki/Forward_algorithm) can also be utilized to predict the probability ? – blue-sky May 11 '18 at 10:38
  • I do not think so. HMMs are to regular grammars what PCFGs are to CFG (more or less), so you need more complex algorithms when you deal with PCFGs (more expressive class of grammar), but the inside-outside algorithm is the natural extension of the forward-backward algorithm (for HMMs) to PCFGs. More abstractly, the search space for HMMs is a simple Viterbi graph, whereas the search space for a PCFG is a hypergraph more info [here](https://www.isi.edu/natural-language/teaching/cs562/2009/readings/semirings.pdf). – mcoav May 11 '18 at 10:46
  • if rule did not exist for np->fish but a rule did exist for np->np could I use the probability for np->np instead of np->fish ? – blue-sky May 11 '18 at 12:19
  • I am not sure what you mean. When you parse or compute the probability of a sentence, you must consider all the grammar rules. However, if you remove NP -> fish from the grammar and replace it by NP -> NP, your grammar will not recognize the sentence "they can fish", so its probability will be 0. Usually, rules of the type X -> X are not useful (they add no information) and are hard to parse (since you can have infinite unary rewrites: np -> np -> np -> ... -> np and potentitally an infinite number of possible parses for a single sentence). – mcoav May 11 '18 at 14:02
  • should "VP -> MD VB 0.5" be "VP -> MD NP 0.5" in order to match parse tree "(S (NP they) (VP (MD can) (NP fish)))" ? – blue-sky May 11 '18 at 18:08
  • 1
    Sorry, I just modified the trees to solve the problem. – mcoav May 11 '18 at 18:15