138

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:

case Process(word) =>
   word.firstLetter match {
      case([a-c][A-C]) =>
      case _ =>
   }
}

But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?

Toto
  • 89,455
  • 62
  • 89
  • 125
Bruce Ferguson
  • 1,851
  • 2
  • 16
  • 21
  • 9
    Be warned: In Scala (and *ML languages), pattern matching has another, very different from regexes, meaning. –  Jan 08 '11 at 22:51
  • 1
    You probably want `[a-cA-C]` for that regular expression. –  Jan 08 '11 at 23:27
  • 2
    in scala 2.8, strings are converted to `Traversable` (like `List` and `Array`), if you want the first 3 chars, try `"my string".take(3)`, for the first `"foo".head` – shellholic Jan 09 '11 at 01:15

7 Answers7

258

You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.

val Pattern = "([a-cA-C])".r
word.firstLetter match {
   case Pattern(c) => c bound to capture group here
   case _ =>
}
r0estir0bbe
  • 699
  • 2
  • 7
  • 23
asm
  • 8,758
  • 3
  • 27
  • 48
  • 7
    beware that you cannot declare a capture group and then not use it (i.e. case Pattern() will not match here) – Jeremy Leipzig Jan 07 '13 at 15:51
  • 40
    Beware that you *must* use groups in your regular expression: ```val Pattern = "[a-cA-C]".r``` will not work. This is because match-case uses ```unapplySeq(target: Any): Option[List[String]]```, which returns the matching groups. – rakensi Dec 16 '13 at 13:01
  • what does the `.r` mean at the end of the `val Pattern = ...`? – Kevin Meredith Feb 04 '14 at 20:59
  • 2
    It's a method on [StringLike](http://www.scala-lang.org/api/current/#scala.collection.immutable.StringLike) which returns a [Regex](http://www.scala-lang.org/api/current/#scala.util.matching.Regex). – asm Feb 05 '14 at 01:50
  • 11
    @rakensi No. `val r = "[A-Ca-c]".r ; 'a' match { case r() => } `. http://www.scala-lang.org/api/current/#scala.util.matching.Regex – som-snytt Mar 09 '15 at 23:28
  • 3
    @JeremyLeipzig ignoring groups: `val r = "([A-Ca-c])".r ; "C" match { case r(_*) => }`. – som-snytt Mar 09 '15 at 23:30
  • Is it possible without the val, i.e. with the regex inlined in case? – ByteEater Dec 31 '21 at 15:58
133

Since version 2.10, one can use Scala's string interpolation feature:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

Even better one can bind regular expression groups:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

It is also possible to set more detailed binding mechanisms:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:

object T {

  class RegexpExtractor(params: List[String]) {
    def unapplySeq(str: String) =
      params.headOption flatMap (_.r unapplySeq str)
  }

  class StartsWithExtractor(params: List[String]) {
    def unapply(str: String) =
      params.headOption filter (str startsWith _) map (_ => str)
  }

  class MapExtractor(keys: List[String]) {
    def unapplySeq[T](map: Map[String, T]) =
      Some(keys.map(map get _))
  }

  import scala.language.dynamics

  class ExtractorParams(params: List[String]) extends Dynamic {
    val Map = new MapExtractor(params)
    val StartsWith = new StartsWithExtractor(params)
    val Regexp = new RegexpExtractor(params)

    def selectDynamic(name: String) =
      new ExtractorParams(params :+ name)
  }

  object p extends ExtractorParams(Nil)

  Map("firstName" -> "John", "lastName" -> "Doe") match {
    case p.firstName.lastName.Map(
          Some(p.Jo.StartsWith(fn)),
          Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
      println(s"Match! $fn ...$lastChar")
    case _ => println("nope")
  }
}
Sim
  • 13,147
  • 9
  • 66
  • 95
kiritsuku
  • 52,967
  • 18
  • 114
  • 136
  • Liked the answer very much, but when tried to use it outside REPL it locked (i.e. exactly the same code that worked in REPL didn't work in running app). Also there is a problem with using the `$` sign as a line end pattern: the compiler complains about lack of string termination. – Rajish Jul 11 '13 at 14:09
  • @Rajish: Don't know what can be the problem. Everything in my answer is valid Scala code since 2.10. – kiritsuku Jul 11 '13 at 18:56
  • @sschaef: that `case p.firstName.lastName.Map(...` pattern—how on earth do I read that? – Erik Kaplun Feb 16 '14 at 21:37
  • 1
    @ErikAllik read it as something like "when 'firstName' starts with 'Jo' and 'secondName' matches the given regex, than the match is successful". This is more an example of Scalas power, I wouldn't write this use case in example this way in production code. Btw, the usage of a Map should be replaced by a List, because a Map is unordered and for more values it isn't guaranteed anymore that the right variable matches to the right matcher. – kiritsuku Feb 17 '14 at 00:43
  • @sschaef: I think this sort of magic is going too far... unless you're doing some heavy regexp pattern matching in your app (and even then you'd need something way way more human readable). – Erik Kaplun Feb 17 '14 at 03:26
  • Also, be aware that using inner capture groups will mess up groups bindings. Code probably can be modified to fix that bug, but my Scala knowledge are not enough for that. – Alex Abdugafarov Feb 25 '14 at 08:09
  • I wouldn't use it either, but that dynamic example is pretty funny. – som-snytt Mar 09 '15 at 23:51
  • 1
    This is very convenient for quick prototyping, but note that this creates a new instance of `Regex` everytime the match is checked. And that is quite a costly operation that involves compilation of the regex pattern. – HRJ Apr 21 '15 at 06:34
  • Nice analog to Ruby's `case/when` statement. – Eric Walker Aug 18 '15 at 15:27
  • It seems if you use groups you *must* append a `${variable}`. – Eric Walker Aug 18 '15 at 15:42
  • This also doesn't seem to compile if the regex contains quote literals: `case r"stuff \" more stuff (.*)$s" => s`. You have to use triple quotes instead: `case r"""stuff " more stuff (.*)$s"""`. – dcastro Oct 05 '15 at 17:43
  • Wish some of the simpler approaches were part of the standard library string interpolations. Have you suggested this to the devs? – Luciano Oct 29 '15 at 08:03
  • Looks like something similar has been requested: https://issues.scala-lang.org/browse/SI-7496 – Luciano Nov 08 '15 at 21:30
51

As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:

word.matches("[a-cA-C].*")

You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).

sepp2k
  • 363,768
  • 54
  • 674
  • 675
29

To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:

val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
  case Process("b", _) => println("first: 'a', some rest")
  case Process(_, rest) => println("some first, rest: " + rest)
  // etc.
}
Community
  • 1
  • 1
Fabian Steeg
  • 44,988
  • 7
  • 85
  • 112
  • I'm really confused by the high hat ^. I though "^" meant "Match the beginning of the line". It's not matching the beginning of the line. – Michael Lafayette Jun 04 '16 at 22:22
  • @MichaelLafayette: Inside of a character class (`[]`), the caret indicates negation, so `[^\s]` means 'non-whitespace'. – Fabian Steeg Jun 06 '16 at 12:02
10

First we should know that regular expression can separately be used. Here is an example:

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello

In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex

Haimei
  • 12,577
  • 3
  • 50
  • 36
10

String.matches is the way to do pattern matching in the regex sense.

But as a handy aside, word.firstLetter in real Scala code looks like:

word(0)

Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:

"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true

I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.

EDIT: To be clear, the way I would do this is, as others have said:

"Cat".matches("^[a-cA-C].*")
res14: Boolean = true

Just wanted to show an example as close as possible to your initial pseudocode. Cheers!

Janx
  • 3,285
  • 3
  • 19
  • 24
  • 3
    ``"Cat"(0).toString`` could be more clearly written as ``"Cat" take 1``, imho. – David Winslow Jan 09 '11 at 17:08
  • Also (though this is an old discussion - I'm probably grave-digging): you can remove the '.*' from the end since it doesn't add any value to the regex. Just "Cat".matches("^[a-cA-C]") – akauppi Mar 29 '13 at 14:47
  • Today on 2.11, `val r = "[A-Ca-c]".r ; "cat"(0) match { case r() => }`. – som-snytt Mar 09 '15 at 23:45
  • What does the hi hat (^) mean? – Michael Lafayette Jun 05 '16 at 00:39
  • It's an anchor meaning 'start of the line' (https://www.cs.duke.edu/csl/docs/unix_course/intro-73.html). So everything that follows the hi hat will match the pattern if it is the first thing on the line. – Janx Jun 05 '16 at 04:52
9

Note that the approach from @AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:

scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*

scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo

scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

And with no .* at the end:

scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)

scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match
mikhail_b
  • 930
  • 9
  • 10
  • 2
    Idiomatically, `val MY_RE2 = "(foo|bar)".r.unanchored ; "foo123" match { case MY_RE2(_*) => }`. More idiomatically, `val re` without all caps. – som-snytt Mar 09 '15 at 23:41