88

Let's say I have this code:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)

I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:

 val pattern = """one.*two(\d+)three""".r
 val pattern(aMatch) = string
 println(aMatch) // prints 483

Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
Geo
  • 93,257
  • 117
  • 344
  • 520

5 Answers5

119

Here's an example of how you can access group(1) of each match:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
   m => println(m.group(1))
}

This prints "483" (as seen on ideone.com).


The lookaround option

Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:

val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)

The above also prints "483" (as seen on ideone.com).

References

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
51
val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r

string match {
  case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
  case _ => // no match
}
caiiiycuk
  • 1,466
  • 14
  • 20
  • 7
    This is the simplest way by far. You use the regex object ("pattern") in a match/case and extracts the group into the variable a483. The problem withthis case is that the pattern should have wildcards on both sides: val pattern = """.*two(\d+)three.*""".r – makingthematrix Feb 02 '16 at 14:56
  • Yes. I don't think the above is immediately clear, but once you understand that it's assigning the digit matching group to the variable 'a483', then it makes more sense. Perhaps rewrite in a clearer fashion ? – Brian Agnew Mar 03 '16 at 14:12
  • 1
    This is the scala way with regex. For people don't understand the magic behind this answer, try search "scala regex extractor" or "scala unapply regex" etc. – JasonWayne Jul 07 '16 at 07:39
  • the semantics is unclear. is this the first, last, or a random match from the string? – user239558 Jun 12 '18 at 11:46
21

Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:

"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"

Or even:

val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483

If you expect non matching input, you can add a default pattern guard:

"one493deux483three" match {
  case s"${x}two${y}three" => y
  case _                   => "no match"
}
// String = "no match"
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
17

You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".

See this regex tutorial.

Stephen
  • 47,994
  • 7
  • 61
  • 70
  • 1
    can you illustrate on the input I provided? I tried to call `group(1)` on what's returned by findAllIn but I get an IllegalStateException. – Geo Jun 16 '10 at 06:00
5
def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
  case Some(i) => i.group(1)
  case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
    "http4://testing.bbmkl.com/document/sth1234.zip")
Gaurav Khare
  • 2,203
  • 4
  • 25
  • 23