I am currently working on a project involving NLP. I have implemented a CKY identifier as given in Jurafsky and Martin (algorithm on page 450). The table so produced actually stores the nonterminals in the table (instead of the usual boolean values). However, the only issue I am getting is to retrieve the parse tree.
Here is an illustration of what my CKY identifier does:
This is my grammar
S -> NP VP
S -> VP
NP -> MODAL PRON | DET NP | NOUN VF | NOUN | DET NOUN | DET FILENAME
MODAL -> 'MD'
PRON -> 'PPSS' | 'PPO'
VP -> VERB NP
VP -> VERB VP
VP -> ADVERB VP
VP -> VF
VERB -> 'VB' | 'VBN'
NOUN -> 'NN' | 'NP'
VF -> VERB FILENAME
FILENAME -> 'NN' | 'NP'
ADVERB -> 'RB'
DET -> 'AT'
And this is the algorithm:
for j from i to LENGTH(words) do
table[j-1,j] = A where A -> POS(word[j])
for i from j-2 downto 0
for k from i+1 to j-1
table[i,j] = Union(table[i,j], A such that A->BC)
where B is in table[i,k] and C is in table[k,j]
And this is what my parsing table looks like after being filled:
Now that I know that since S resides in [0,5], the string has been parsed, and that for k = 1 (as per the algorithm given in Martin and Jurafsky), we have S -> table[0][2] table[2][5] i.e. S -> NP VP
The only issue I am getting is that I have been able to retrieve the rules used, but then they are in a jumbled format, i.e. not on the basis of their appearance in parse tree. Can someone suggest an algorithm to retrieve the correct parse tree?
Thankyou.