33

I fairly frequently match strings against regular expressions. In Java:

java.util.regex.Pattern.compile("\w+").matcher("this_is").matches

Ouch. Scala has many alternatives.

  1. "\\w+".r.pattern.matcher("this_is").matches
  2. "this_is".matches("\\w+")
  3. "\\w+".r unapplySeq "this_is" isDefined
  4. val R = "\\w+".r; "this_is" match { case R() => true; case _ => false}

The first is just as heavy-weight as the Java code.

The problem with the second is that you can't supply a compiled pattern ("this_is".matches("\\w+".r")). (This seems to be an anti-pattern since almost every time there is a method that takes a regex to compile there is an overload that takes a regex).

The problem with the third is that it abuses unapplySeq and thus is cryptic.

The fourth is great when decomposing parts of a regular expression, but is too heavy-weight when you only want a boolean result.

Am I missing an easy way to check for matches against a regular expression? Is there a reason why String#matches(regex: Regex): Boolean is not defined? In fact, where is String#matches(uncompiled: String): Boolean defined?

schmmd
  • 18,650
  • 16
  • 58
  • 102
  • 3
    It's worth noting that `String#matches(string: String)` is not defined by either the 2.9 spec or the [StringLike](http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.StringLike) type from the standard library. It is, in fact, an artifact of the definition of [Strings in Java](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#matches(java.lang.String)). – ig0774 Nov 28 '11 at 20:44
  • I don't understand what you mean by too heavy-weight in the first example? Do you mean that the code is too long, or do you mean that it's doing too much work? – Ian McLaird Nov 28 '11 at 21:12
  • 1
    too much code, the work is exactly what I want – schmmd Nov 29 '11 at 00:55
  • @ig0774, thanks for that point. I was confused why I couldn't find it. – schmmd Nov 29 '11 at 00:55

4 Answers4

33

You can define a pattern like this :

scala> val Email = """(\w+)@([\w\.]+)""".r

findFirstIn will return Some[String] if it matches or else None.

scala> Email.findFirstIn("test@example.com")
res1: Option[String] = Some(test@example.com)

scala> Email.findFirstIn("test")
rest2: Option[String] = None

You could even extract :

scala> val Email(name, domain) = "test@example.com"
name: String = test
domain: String = example.com

Finally, you can also use conventional String.matches method (and even recycle the previously defined Email Regexp :

scala> "david@example.com".matches(Email.toString)
res6: Boolean = true

Hope this will help.

David
  • 2,399
  • 20
  • 17
  • 1
    @schmmd don't forget `.r` to build a `Regex`. – David Nov 29 '11 at 07:04
  • Oops! wouldn't it be nice to have `matches` defined in `Regex`? – schmmd Nov 30 '11 at 18:20
  • @schmmd, thanks for the idea. You can use the conventional String.matches method and recycle your previously defined `Regex` like this : `"david@example.com".matches(Email.toString)` -> will return `true`. – David Nov 30 '11 at 21:06
  • @David Wouldn't the above given regex `"""(\w+)@([\w\.]+)""".r` would parse `abc@gmail_com` as valid email ? Shouldn't the regex be `"""(\w+)@([a-zA-Z0-9.]+)""".r` ? – himanshuIIITian Feb 04 '17 at 14:15
16

I created a little "Pimp my Library" pattern for that problem. Maybe it'll help you out.

import util.matching.Regex

object RegexUtils {
  class RichRegex(self: Regex) {
    def =~(s: String) = self.pattern.matcher(s).matches
  }
  implicit def regexToRichRegex(r: Regex) = new RichRegex(r)
}

Example of use

scala> import RegexUtils._
scala> """\w+""".r =~ "foo"
res12: Boolean = true
Ian McLaird
  • 5,507
  • 2
  • 22
  • 31
  • 1
    Cool! Though I'd call the operator `~` rather than `~=` because operators that end in `=` look to me like in-place mutations (from C++ and Python conventions...). – Jim Pivarski Oct 09 '13 at 22:07
  • Yeah, I was aiming for perl's =~ but got the name backwards, apparently. – Ian McLaird Oct 10 '13 at 13:19
  • Just thought I'd mention Haskell has a the =~ operator for matching regexes too. I've seen `~=` used to mean not-equals, like `!=`. – Erik Post Oct 19 '13 at 20:33
  • Thanks for the comments, guys. I've edited the answer to reflect your suggestions. – Ian McLaird Oct 20 '13 at 05:12
4

I usually use

val regex = "...".r
if (regex.findFirstIn(text).isDefined) ...

but I think that is pretty awkward.

Ralph
  • 31,584
  • 38
  • 145
  • 282
1

Currently (Aug 2014, Scala 2.11) @David's reply tells the norm.

However, it seems the r."..." string interpolator may be on its way to help with this. See How to pattern match using regular expression in Scala?

Community
  • 1
  • 1
akauppi
  • 17,018
  • 15
  • 95
  • 120