5

I was reading about the CYK algorithm, and there is one part of pseudo-code I cannot understand. The whole pseudo-code is:

let the input be a string S consisting of n characters: a1 ... an.
let the grammar contain r nonterminal symbols R1 ... Rr.
This grammar contains the subset Rs which is the set of start symbols.
let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
for each i = 1 to n
  for each unit production Rj -> ai
    set P[i,1,j] = true
for each i = 2 to n -- Length of span
  for each j = 1 to n-i+1 -- Start of span
    for each k = 1 to i-1 -- Partition of span
      for each production RA -> RB RC
        if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then
  S is member of language
else
  S is not member of language

These parts are which I am confused:

    for each production RA -> RB RC
      if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

Would someone give some hints about these pseudocode?

syb0rg
  • 8,057
  • 9
  • 41
  • 81
  • @syb0rg: I intentionally leave the indentation, so that it is easier to locate the smaller snippet of code from the big chunk of code. – nhahtdh Apr 15 '13 at 01:10
  • @nhahtdh Fixed (formatting habit, sorry). – syb0rg Apr 15 '13 at 01:12
  • @syb0rg: The indentation of the smaller code snippet is a bit off (you can just copy and paste from the original code). – nhahtdh Apr 15 '13 at 01:14

1 Answers1

3

The pseudocode

For each production RA → RB RC:

if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

Should be interpreted in the following way. Suppose that it's the case that P[j, k, B] is true. That means that the string formed from k characters starting at position j can derived from the nonterminal RB. If it's also the case that P[j + k, i - k, C] is true, then the string formed from the i - k characters starting at position j + k can be derived from nonterminal RC. Therefore, since RA → RB RC is a production, it's the case that the string formed from the i characters starting at position j can be derived from RA.

I think it might help to interpret that pseudocode as

For each production RA → RB RC:

if P[j,k,B] == true and P[j+k,i-k,C] == true, then set P[j,i,A] = true

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Can you clarify what the indexes A B and C are please? – user2280838 Apr 15 '13 at 01:17
  • @user2280838- The algorithm numbers all of the nonterminals R_1, R_2, ..., R_n. Here, A, B, and C happen are the indices of the nonterminals in the production R_A -> R_B R_C. For example, if the production was S -> T U and S had index 1, T had index 2, and U had index 3, then we would have A = 1, B = 2, and C = 3. Does that help? – templatetypedef Apr 15 '13 at 01:19
  • It does help but what if A B and C as non terminals are defined more than once in the grammar? Is the assignment of the index sort of an ID value that helps it being distinguished by other nonterminals? – user2280838 Apr 15 '13 at 01:28
  • @user2280838- The pseudocode works by assigning every nonterminal a unique ID. You're not "redefining" the same nonterminal twice if you associate multiple productions with it; it's the same nonterminal each time. This means that even if you have something like S -> UT and S -> XY, the S is the same in both cases (and has the same index). – templatetypedef Apr 15 '13 at 01:39