0

Since extractors cannot take custom parameters (as answered in Stack Overflow: Can extractors be customized...), I try to find an alternative way of solving the following problem.

I have a lot of translations which can be combined. In my code snippet, a dimension can be combined with a factor. For instance "width multiplied by 2". But it can also be "width" (unmultiplied). And there will be further cases like that. I try to classify those string inputs using pattern matching. "width" and "width multiplied by x" should be classified as "width" (key "w"), "height" and "height multiplied by x" should be classified as "height" (key "h"), and so on.

That should be done by the last match in the following example code snippet, which will contain many cases (6 in the example code snippet) each of which should take a key: String parameter ("w", "h", "l", "r", "t", "b").

What I try to achieve is passing the key (that is "w", "h", "l", "r", "t", "b" and so on) to the case Untranslation(v). But obviously I cannot do that (the unapply function can take implicit parameters, but no additional explicit ones).

Now I try to find an alternative but still concise way of classifying my string inputs.

implicit val translations = Map(
  "w" -> "width",
  "h" -> "height",
  "l" -> "left",
  "r" -> "right",
  "t" -> "top",
  "b" -> "bottom",
  // + some more translations
  "m" -> "multiplied by"
)

sealed trait CommandType
object CommandType {
  case object Unmodified extends CommandType
  case object Multiplied extends CommandType
  // ...
}

object Untranslation {
  def unapply(s: String)(implicit t: Map[String, String]): Option[CommandType] = {
    val key: String = "w" // should be variable by case
    val a: List[String] = t(key).split(" ").toList
    val b: List[String] = t("m").split(" ").toList
    val ab: List[String] = a ++ b
    s.split(" ").toList match {
      case `a` => Some(CommandType.Unmodified)
      case `ab` :+ value => Some(CommandType.Multiplied)
      // + some more cases
      case _ => None
    }
  }
}

"width multiplied by 2" match {
  case Untranslation(v) => println(v) // here I would like to pass the key ("w"/"h"/"l"/...)
  case _ => println("nothing found")
}
// outputs: Multiplied
ideaboxer
  • 3,863
  • 8
  • 43
  • 62
  • 2
    I don't understand at all what you are trying to do there. If you want to do something with "commands", why don't you parse it properly, and then interpret the parsed structures accordingly? Why do you want to "classify" anything? Why can't you do it just by checking whether a command `startsWith` some word? I don't even understand what you mean by "translations": do you mean geometric translations (seems semi-plausible, because you are talking about widths and heights and factors etc), or do you mean translations between languages (which languages?)? – Andrey Tyukin May 13 '18 at 22:47
  • @AndreyTyukin, not being OP I must guess, but I think the case is about different human languages. I imagine that the goal is to parse a string that is coming in a known fixed language into an inner abstract representation. And the plan seems to be that `t` contains translations of all relevant parts on their own. So for English `t("w")` is `"width"` and `t("m")` is `"multiplied by"`. And it looks like OP wants to use pattern matching to be able to parse strings that would be mapped onto `"wm(value)"` and `"hm(value)"` and so on and just `"w"` or just `"h"` and so on. – SergGr May 13 '18 at 23:05
  • @SergGr If this is the case, then the OP should be reading about stacked long-short-term memory neural networks with attention mechanisms and what not... Trying to do it with some convoluted hard-coded pattern matchings seems to be way too far away from everything that everyone else is doing. – Andrey Tyukin May 13 '18 at 23:13
  • @AndreyTyukin, I agree that real world human languages are much more complicated than this simple construction but it might be that the text being parsed is actually in a pseudo-natural language that fits this structure exactly. More to the point, you asked what the question means, and it looks to me that my interpretation above fits the code and text pretty well. – SergGr May 13 '18 at 23:24

3 Answers3

3

You can easily create a parameterized class for extractors instead of an object:

class Untranslation(val key: String) {
  def unapply(s: String)(implicit t: Map[String, String]): Option[CommandType] = {
    val a: List[String] = t(key).split(" ").toList
    val b: List[String] = t("m").split(" ").toList
    val ab: List[String] = a ++ b
    s.split(" ").toList match {
      case `a` => Some(CommandType.Unmodified)
      case `ab` :+ value => Some(CommandType.Multiplied)
      // + some more cases
      case _ => None
    }
  }
}

To match, an extractor needs to have a stable identifier, which can be done by assigning it to a val (so you unfortunately need an extra line for each key, but of course they can be used in multiple matches):

val UntranslationW = new Untranslation("w")
val UntranslationT = new Untranslation("t")
...

"width multiplied by 2" match {
  case UntranslationW(v) => ...
  case UntranslationT(v) => ...
  case _ => println("nothing found")
}
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
2

Possibly your question duplicates this one.

package ex

import language._

object units extends Dynamic {
  class Helper(kind: String) {
    val kindof = kind match {
      case "s" => Symbols.s
      case "m" => Symbols.m
    }
    def value = raw"(\d+)${kindof.name}".r
    object pair {
      def unapply(s: String): Option[(Int, Symbol)] =
        value.unapplySeq(s).map(vs => (vs.head.toInt, kindof))
    }
  }
  def selectDynamic(kind: String) = new Helper(kind)
  object Symbols { val s = 'sec ; val m = 'min }
}

object Test {
  def main(args: Array[String]): Unit = println {
    args(0) match {
      case units.s.pair(x, s) => s"$x ${s.name}"
      case units.s.value(x) => s"$x seconds"
      case units.m.value(x) => s"$x minutes"
    }
  }
}

The customization is built into the selection in the case expression. That string is used to construct the desired extractor.

$ scalac ex.scala && scala ex.Test 24sec
24 sec

$ scalac ex.scala && scala ex.Test 60min
60 minutes
som-snytt
  • 39,429
  • 2
  • 47
  • 129
  • Thank you. That is almost what I need, but my equivalents to the units in your example would be stored as symbols. How can I replace `units.s`, `units.m` and so on by symbols? (type `Symbol`) – ideaboxer May 20 '18 at 20:32
  • `Symbol(kind)`. – som-snytt May 21 '18 at 16:50
  • I need to write `units.MyObject.theSymbol` where `theSymbol` is a Symbol defined as member of an object. This gives me a compiler error. – ideaboxer May 22 '18 at 14:35
  • I can no longer unbox your idea. If `TheSymbol` (to mean your unit?) is a stable reference, you can have a regular extractor of `(value, unit)` and `case Extract(value, TheSymbol)`, uppercase TheSymbol (or in backticks) makes it match on that value. Probably I have no idea what syntax you want. – som-snytt May 22 '18 at 16:01
  • Yes I just need to replace the `s` and `m` by stable identifiers which are symbols in my case. But I do not understand how I could write that. Say we have the following object: `object MyObject { val s = 's; val m = 'm }` (`'s` represents `s`; `'m` represents `m`). How could I add that to your snippet? – ideaboxer May 22 '18 at 16:57
  • 1
    It should be possible to use shapeless to provide compile-time safety between the path in the pattern `s` and the record member `s` with value `'sec`, but I hit a compiler crash; I'll update if I can figure that out later. – som-snytt May 22 '18 at 22:46
  • I cannot even write it in that form because my symbols are either imported from another file (where they are defined at a single place as part of an object), or they may be symbols containing spaces and thus I cannot write them using the `'` notation. In addtion to that, since I have a lot of match cases, I cannot prepare any `val`s in preceding code lines because that would mean lots of additional lines of code and negatively affect the legibility of the code. – ideaboxer May 23 '18 at 20:08
1

Whether you want to implement a proper parser or not, you should at least create the data structures that can represent your commands faithfully.

Here is one proposal:

sealed trait Dimension {
  def translate(implicit t: Map[Symbol, String]) = 
    t(Symbol(toString.toLowerCase))
}
case object W extends Dimension
case object H extends Dimension
case object L extends Dimension
case object R extends Dimension
case object T extends Dimension
case object B extends Dimension
object Dimension {
  def all = List(W, H, L, R, T, B)
}

sealed trait CommandModifier {
  def translate(implicit t: Map[Symbol, String]): String
}
case object Unmodified extends CommandModifier {
  def translate(implicit t: Map[Symbol, String]) = ""
}
case class Multiplied(factor: Int) extends CommandModifier {
  def translate(implicit t: Map[Symbol, String]) = t('m) + " " + factor
}


case class Command(dim: Dimension, mod: CommandModifier) {
  def translate(implicit t: Map[Symbol, String]) = 
    dim.translate + " " + mod.translate
}

A Command is a proper case class that has the dimension and the modifier as member. The CommandModifiers are modeled as a separate sealed trait. The Dimensions (width, height etc.) are essentially just an enumeration. The short magic-value Strings "w", "h" have been replaced by symbols 'w, 'h etc.

Now you can implement an Untranslation extractor that extracts the entire command in one go, and therefore does not need any additional parameters:

object Untranslation {
  def unapply(s: String)(implicit t: Map[Symbol, String]): Option[Command] = {
    val sParts = s.split(" ").toList
    for (dim <- Dimension.all) {
      val a: List[String] = dim.translate.split(" ").toList
      val b: List[String] = t('m).split(" ").toList
      val ab: List[String] = a ++ b
      sParts match {
        case `a` => return Some(Command(dim, Unmodified))
        case `ab` :+ value => return Some(Command(dim, Multiplied(value.toInt)))
        // + some more cases
        case _ => None
      }
    }
    None
  }
}

A small example. Here is how you can parse and write out commands in English and German. First, the two dictionaries that map the formal symbols to actual words in a natural language:

val En = Map(
  'w -> "width",
  'h -> "height",
  'l -> "left",
  'r -> "right",
  't -> "top",
  'b -> "bottom",
  'm -> "multiplied by"
)

val De = Map(
  'w -> "Breite",
  'h -> "Höhe",
  'l -> "links",
  'r -> "rechts",
  't -> "oben",
  'b -> "unten",
  'm -> "mal"
)

Using the En-dictionary, you can now match commands in English:

for (example <- List(
  "width multiplied by 2",
  "top",
  "height multiplied by 42"
)) {
  println("-" * 60)
  implicit val lang = En
  example match {
    case Untranslation(v) => {
      println(v)
      println(v.translate(En))
      println(v.translate(De))
    }
    case _ => println("invalid command")
  }
}

Here is what is matched, and how it is translated in both English and German:

------------------------------------------------------------
Command(W,Multiplied(2))
width multiplied by 2
Breite mal 2
------------------------------------------------------------
Command(T,Unmodified)
top 
oben 
------------------------------------------------------------
Command(H,Multiplied(42))
height multiplied by 42
Höhe mal 42

The same works the other way round, from German to English:

for (example <- List(
  "Breite mal 2",
  "oben",
  "Höhe mal 42"
)) {
  println("-" * 60)
  implicit val lang = De
  example match {
    case Untranslation(v) => {
      println(v)
      println(v.translate(En))
      println(v.translate(De))
    }
    case _ => println("invalid command")
  }
}

Output:

------------------------------------------------------------
Command(W,Multiplied(2))
width multiplied by 2
Breite mal 2
------------------------------------------------------------
Command(T,Unmodified)
top 
oben 
------------------------------------------------------------
Command(H,Multiplied(42))
height multiplied by 42
Höhe mal 42

Note that the entire approach with string splitting and pattern matching is extremely brittle, and does not scale at all. If you want to do it properly, you have to write a proper parser (either using a parser generator, or using a parser combinator library).

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93