3

I'm new to functional programming and was wondering how one solves the problem of calculating the set of nullable nonterminals in a context-free grammar in a pure functional way without using variable assignments.

A nullable nonterminal is a nonterminal directly yielding empty, e.g., A ::= , or having a body containing of nullable nonterminals, e.g., A ::= B C D, where all B C and D yield empty.

I'm using the following definitions in Scala to define a grammar:

case class Grammar(name:String, startSymbol:Nonterminal, rules:List[Rule])
case class Rule(head: Nonterminal, body:List[Symbol])
abstract class Symbol
case class Terminal(c:Char) extends Symbol
case class Nonterminal(name:String) extends Symbol

A basic algorithm is that to gather all directly nullable nonterminals and put them in a set. Then in each iteration try to determine which production rules have all nullable nonterminals on their body. Those nonterminals will be added to the set until no new nonterminal is added to the set.

I have implemented this procedure in Scala as:

  def getNullableNonterminals(grammar:Grammar) = {

  var nieuw : Set[Nonterminal] = (for(Rule(head, Nil) <- grammar.rules) yield head) (collection.breakOut)
  var old = Set[Nonterminal]()

  while(old != nieuw) {
    old = nieuw
    for{
        Rule(head, symbols) <- grammar.rules
        if symbols.length > 0
        if symbols.forall( s => s.isInstanceOf[Nonterminal] && old.contains(s.asInstanceOf[Nonterminal]))
       } nieuw = nieuw + head
  }
  nieuw   
}

Although the code is much shorter than the equivalent Java version, it uses variables. Any suggestions to rewrite this piece of code in a functional style?

Wickoo
  • 6,745
  • 5
  • 32
  • 45

3 Answers3

1

Here is a more idiomatic Scala solution:

object Algorithm {

  def getNullableNonterminals(grammar:Grammar) = {
    loop(grammar, Set())
  }

  @tailrec
  private def loop(grammar: Grammar, nullablesSoFar: Set[Nonterminal]): Set[Nonterminal] = {
    val newNullables = generateNew(grammar, nullablesSoFar)
    if (newNullables.isEmpty)
      nullablesSoFar //no new nullables found, so we just return the ones we have
    else
      loop(grammar, nullablesSoFar ++ newNullables) //add the newly found nullables to the solution set and we keep going
  }

  private def generateNew(grammar: Grammar, nullableSoFar: Set[Nonterminal]) = {
    for {
      Rule(head, body) <- grammar.rules
      if !nullableSoFar.contains(head)
      if body.forall(isNullable(_, nullableSoFar))
    } yield head
  }

  //checks if the symbol is nullable given the current set of nullables
  private def isNullable(symbol: Symbol, provenNullable: Set[Nonterminal]) = symbol match {
    case Terminal(_) => false
    case x@Nonterminal(_) => provenNullable.contains(x)
  }

}

The while statement is replaced with a recursive function - loop. Also, avoid using isInstanceOf - pattern matching is much better suited for this.

Small observation - make the Symbol class sealed, since this can enforce warnings of missing cases in pattern matches.

Marius Danila
  • 10,311
  • 2
  • 34
  • 37
  • Thanks for your answer. It's closer to a more functional way, but again it uses the same fixed point mechanism I used, as indicated by @mhs. – Wickoo Jan 21 '13 at 13:31
  • Not really, since I'm not recomputing the nullables set all over again at each iteration. I'm just adding incrementally based on past results . It can easily be optimized by removing rules which are proven to lead to nullable or non-nullable results. Also, I wouldn't call a solution with two mutable variables a functional one – Marius Danila Jan 21 '13 at 13:35
  • By fixed point computation, I meant that u are checking at each iteration until there is no change. I was not talking about recomputation. – Wickoo Jan 21 '13 at 13:43
  • @mhs Any comments on: "calling your solution functional despite having mutable states?" The thing that I like about mhs solution is that it only describes the specification without going into how to compute it. – Wickoo Jan 21 '13 at 13:59
  • Ok, I understand now. But you should note that whilst this solution works for rules of type "B -> aBc", @mhs's solution will cycle infinitely. This is also the case with mutually recursive rules – Marius Danila Jan 21 '13 at 14:02
  • Yes, you're right. For left recursive rules of the form A ::= A B, the process does not terminate. There is a need to keep track of those. – Wickoo Jan 21 '13 at 14:12
  • @M.A.D. @Ali I did not call my solution functional, but see my comment about the state monad. "B -> aBc" does terminate because `forall` stops on the first false result, here on `a`. "A -> A B" will indeed not terminate, good observation. Kiama (http://code.google.com/p/kiama/) enables you to work with cyclic grammars, you might want to consider using that great library in general. – Malte Schwerhoff Jan 21 '13 at 14:19
  • @mhs I heard about Kiama, but I'm not sure how it may help. Do you have any concrete example of how Kiama may help in this case? – Wickoo Jan 21 '13 at 14:45
  • @Ali Sorry, not of the cuffs, but their docs on cyclic rules might spark an idea. – Malte Schwerhoff Jan 21 '13 at 16:46
  • @M.A.D. I'm now convinced that for a generic solution we need a fixed-point calculation mechanism. So until someone else provides a fix for the pure recursive one, I'm going to accept your answer as it correctly does a fixed point calculation in a functional way. Though, in the end, my own version maybe shorter and more understandable. – Wickoo Jan 22 '13 at 15:06
1

Here is another approach using memoisation (a reference, another reference), which avoids the need for a fixed-point computation as in yours and M. A. D.'s solution. Moreover, it is a general pattern applicable to loads of scenarios. Have a look at the Scalaz implementation.

def getNullableNonterminals(g: Grammar): Iterable[Nonterminal] = {
  /* Cache that is used by isNullable to memoise results. */
  var cache: Map[Nonterminal, Boolean] = Map()

  /* Assumption: For each nonterminal nt there exists only one rule r
   * such that r.head == nt.
   */
  var rules: Map[Nonterminal, List[Symbol]] = g.rules.map(r => (r.head, r.body)).toMap

  def isNullable(s: Symbol): Boolean = s match {
    case _: Terminal => false
    case nt: Nonterminal =>
      /* Either take the cached result, or compute it and store it in the cache. */
      cache.getOrElse(nt, {
        /* rules(nt) assumes that there is a rule for every nonterminal */
        val nullable = rules(nt) forall isNullable
        cache += ((nt, nullable))
        nullable
      })
  }

  rules.keys filter isNullable
}

Test case:

val ta = Terminal('a')
val tb = Terminal('b')

val ntX = Nonterminal("X")
val ntY = Nonterminal("Y")
val ntZ = Nonterminal("Z")
val ntP = Nonterminal("P")
val ntQ = Nonterminal("Q")
val ntR = Nonterminal("R")
val ntS = Nonterminal("S")

val rX = Rule(ntX, ntP :: ntQ :: Nil)
val rY = Rule(ntY, ntP :: ta :: ntQ :: Nil)
val rZ = Rule(ntZ, ntR :: Nil)
val rP = Rule(ntP, ntQ :: Nil)
val rQ = Rule(ntQ, Nil)
val rR = Rule(ntR, tb :: Nil)
val rS = Rule(ntS, ntX :: ntY :: ntZ :: Nil)

val g = Grammar("Test", ntS, List(rX, rY, rZ, rP, rQ, rR, rS))

getNullableNonterminals(g) foreach println
  // Nonterminal(Q), Nonterminal(X), Nonterminal(P)
Community
  • 1
  • 1
Malte Schwerhoff
  • 12,684
  • 4
  • 41
  • 71
  • Great! This is what I was looking for: rules.keys filter isNullable Btw, your assumption that each nonterminal is mapped to a single list of symbols is not always true. For example, for an alternate such as A ::= B C | C D, we have two rules with the same head, so we will need a map from heads to a set of bodies. Another question: Is it possible to do it without var, to have no side effect at all? Based on your solution I implemented something without memorisation but it may be slow for large grammars. – Wickoo Jan 21 '13 at 13:30
  • Well, you can always replace the pair (var, immutable collection) by (val, mutable collection), but you only "hide" the side effects one step further away. You could also pass the cache as an additional argument to `isNullable` (and return it as an additional return value) instead of accessing global state. See the nice concept of state monads (http://stackoverflow.com/questions/7734756/scalaz-state-monad-examples) for a general treatment of this idea. However, at the end of the day you do need some sort of updateable state, because without that you simply can't implement a cache. – Malte Schwerhoff Jan 21 '13 at 13:38
  • 1
    @Ali By the way, a very interesting use of memoisation is described in the paper "Packrat Parsing: Simple, Powerful, Lazy, Linear Time" by Bryan Ford: http://pdos.csail.mit.edu/~baford/packrat/icfp02/packrat-icfp02.pdf – Malte Schwerhoff Jan 21 '13 at 13:45
  • This method is not going to work for recursive rules of the form A ::= alpha1 A alpha2, where alpha1 and alpha2 are arbitrary sequences of symbols. In a sense I think we need a fixed point calculation method anyway... – Wickoo Jan 22 '13 at 15:04
0

I have finally found time to write an example of how to do the grammar nullability computation using circular attribute grammars. The code below uses our Kiama language processing library for Scala. You can find the full source code of the example and tests in Kiama. See SemanticAnalysis.scala for the main attribution code, e.g., nullable.

In short, the approach does the following:

  • represents a grammar as an abstract syntax tree structure,

  • performs name analysis on the tree structure to resolve references from uses of grammar symbols to definitions of those symbols, and

  • computes nullability as a circular attribute on the resulting DAG structure.

The attribute definitions I use are quite similar to the ones used as examples in the paper Circular Reference Attribute Grammars by Magnusson and Hedin from LDTA 2003. They implement circular attributes in their JastAdd system and I would highly recommend the paper for anyone wishing to understand this topic. We use essentially the same algorithms in Kiama.

Here is the definition of the AST that the example uses. Tree is a Kiama type that provides some common behaviour.

sealed abstract class GrammarTree extends Tree

case class Grammar (startRule : Rule, rules : Seq[Rule]) extends GrammarTree

case class Rule (lhs : NonTermDef, rhs : ProdList) extends GrammarTree

sealed abstract class ProdList extends GrammarTree
case class EmptyProdList () extends ProdList
case class NonEmptyProdList (head : Prod, tail : ProdList) extends ProdList

case class Prod (symbols : SymbolList) extends GrammarTree

sealed abstract class SymbolList extends GrammarTree
case class EmptySymbolList () extends SymbolList
case class NonEmptySymbolList (head : Symbol, tail : SymbolList) extends SymbolList

sealed abstract class Symbol extends GrammarTree
case class TermSym (name : String) extends Symbol
case class NonTermSym (nt : NonTermUse) extends Symbol

sealed abstract class NonTerm extends GrammarTree {
    def name : String
}
case class NonTermDef (name : String) extends NonTerm
case class NonTermUse (name : String) extends NonTerm

The code below shows the definition of the nullable attribute. It starts out false and then a fixed point "loop" is entered to compute until the value stabilises. The cases show how to compute the attribute for different types of nodes in the AST.

Kiama's circular attribute constructor incorporates all of the implementation of the attributes, including storage caching, fixed point detection etc.

val nullable : GrammarTree => Boolean =
    circular (false) {

        // nullable of the start rule
        case Grammar (r, _) =>
            r->nullable

        // nullable of the right-hand side of the rule
        case Rule (_, rhs) =>
            rhs->nullable

        // nullable of the component productions
        case EmptyProdList () =>
            false
        case NonEmptyProdList (h, t) =>
            h->nullable || t->nullable

        // nullable of the component symbol lists
        case Prod (ss) =>
            ss->nullable
        case EmptySymbolList () =>
            true
        case NonEmptySymbolList (h, t) =>
            h->nullable && t->nullable

        // terminals are not nullable
        case TermSym (_) =>
            false

        // Non-terminal definitions are nullable if the rule in which they
        // are defined is nullable. Uses are nullable if their associated
        // declaration is nullable.
        case NonTermSym (n) =>
            n->nullable
        case n : NonTermDef =>
            (n.parent[Rule])->nullable
        case n : NonTermUse =>
            (n->decl).map (nullable).getOrElse (false)

    }

The reference attribute decl is the one that connects a non-terminal use to its corresponding definition on the left-hand side of a rule. The parent field is a reference from a node to its parent in the AST.

Since the nullability of one rule or symbol depends on the nullability of others, what you get is a set of attribute occurrences that participate in a dependence cycle. The result is a declarative version of the nullability calculation that closely resembles a "textbook" definition. (The example also defines attributes for FIRST and FOLLOW set computations that are defined in terms of nullability.) Circular attributes combine memoisation and fixed point computation in a convenient package for this kind of problem.

inkytonik
  • 534
  • 2
  • 6