2

I am writing a simple parser is Scala.

I have a base trait which represents an element in the file.

trait Token[T] {
    def stringValue: String
    def value: T
}

This is what I need - the string (text) value and the parsed value (which sometimes will the the same string). Now I want to have a set of subclasses for:

  • reserved symbols / keywords e.g. class, void etc.
  • special symbols e.g. +, / etc.
  • integer literals e.g. 123
  • real literals e.g. 1.23
  • string literals e.g. '123'

How would you implement a hierarchy like this? Since this is finite it'd be nice to use a case class.. I think. But an enumeration would be great too .. How to combine?


In other words, what's the best way to write this (below) in Scala in a more Scal-ish way?

public interface Token<T> {
    String stringValue();
    T value();
}

public enum ReservedSymbol implements Token<ReservedSymbol> {
    CLASS('class'), VOID('void');

    private String val;
    private ReservedSymbol(String val) { this.val = val; }

    public String stringValue() { return val; }
    public ReservedSymbol value() { return this; }
}


public class IntegerLiteral implements Token<Integer> {       
    private Integer val;
    public IntegerLiteral(String val) { this.val = Integer.valueOf(val); }

    public String stringValue() { return val.toString(); }
    public Integer value() { return val; }
}

etc.

Gianmarco
  • 2,536
  • 25
  • 57
Sir Bohumil
  • 231
  • 1
  • 9

2 Answers2

5

When building such a hierarchy in Scala, try to apply the following principles:

  1. Sketch the class hierarchy you need. Design it in way that only leaf nodes are instantiated and the inner nodes are abstract.
  2. Implement the inner nodes as traits
  3. Implement the leaf nodes as case classes

The reason for this is, that case classes add a lot of useful magic automatically (toString, unapply, serialize, equals, etc.). But the necessary code is generated in away that is not compatible with inheritance between case classes (e.g. equals would not work properly).

Usually leaf types without parameters are usually modeled as case object while leaf types with parameters are modeled as case class.

When you need to instantiate an inner node of the type tree, just add an artificial leaf and implement it as case class / case object.

You can also use Enumerations in Scala, but usually case classes are more practical. Enumerations are usually a good choice, when you need to convert a given string to the corresponding enumeration. You can do this with Enumeration.withName(String).

In the answer from Alexey Romanov you can see how to apply this principles to a type tree with one root node and three leaf nodes.

  1. Token (inner node => trait)

    1.1. ClassSymbol (leaf node without parameters => case object)

    1.2. VoidSymbol (leaf node without parameters => case object)

    1.3. IntegerLiteral. (leaf node with parameters => case class)

An example for your situation, using both enums and case classes:

trait Token[T]{
  def stringValue: String 
  def value: T
}

object ReservedSymbolEnum extends Enumeration {
  type ReservedSymbolEnum = Value
  val `class`, `void` = Value
      val NullValue = Value("null") // Alternative without quoting
}

case class ReservedSymbol(override val stringValue: String)extends Token[ReservedSymbolEnum.ReservedSymbolEnum] {
  def value = ReservedSymbolEnum.withName(stringValue)
}

case class StringLiteral(override val stringValue: String) extends Token[String] {
  override def value = stringValue
}

case class IntegerLitaral(override val stringValue: String) extends Token[Int] {
  override def value = stringValue.toInt
}

Some usage examples:

scala> def `void`=ReservedSymbol("void")
void: ReservedSymbol

scala> `void`.value
res1: ReservedSymbolEnum.Value = void

scala> def `42`=IntegerLiteral("42")
42: IntegerLitaral

scala> `42`.value
res2: Int = 42
Community
  • 1
  • 1
stefan.schwetschke
  • 8,862
  • 1
  • 26
  • 30
  • Thanks. But think about my situation - what would you do when trying to model a Token hierarchy? There will be ca. 50 keywords etc. I'd also like to make sub hierarchies like Int extends Number , Number extends Token etc. – Sir Bohumil Sep 16 '13 at 11:34
  • See edit above. The sub hierarchies are inner nodes in the type tree and hence you should model them as traits. – stefan.schwetschke Sep 16 '13 at 12:13
  • This is nice. What do you think about this code: http://ideone.com/uIdHxj that I've been working on in the mean time? – Sir Bohumil Sep 16 '13 at 12:22
  • Looks good. You could put the keywords in an enum to make them more type safe. – stefan.schwetschke Sep 16 '13 at 13:00
  • your solution combined with some of my ideas from the link seems really good. How would you define an enum / case class for operators (e.g. `'+='` )? I dont want to *escape* them like you did with class and void.. – Sir Bohumil Sep 16 '13 at 13:57
  • I have added an example to the enum. – stefan.schwetschke Sep 16 '13 at 14:13
3
sealed trait Token[+T] { // sealed means it only can be extended in this file
  def stringValue: String
  def value: T
}

// cast can be avoided if you are happy with extending Token[ReservedSymbol]
// as in the Java example
// class instead of trait so that it can have a constructor argument
sealed class ReservedSymbol[+T <: ReservedSymbol[T]](val stringValue: String) extends Token[T] {
  def value = this.asInstanceOf[T] 
}

// no body necessary
case object ClassSymbol extends ReservedSymbol[ClassSymbol]("class")
case object VoidSymbol extends ReservedSymbol[VoidSymbol]("void")

case class IntegerLiteral(val: Int) extends Token[Int] { ... }
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • Could you give a longer example? I will have 100 keywords (not only class and void), co a separate object declaration for each symbol will be bad. This is why I was thinking of an Enum in Java – Sir Bohumil Sep 16 '13 at 11:30