1

I want split abcd\r\nabc\r\nppp to (abcd\r\nabc, ppp) with regex "(.*)\r\n(.*)".r.

but the regex match fail as this:

object Regex extends App {
  val r = "(.*)\r\n(.*)".r
  val str = "abcd\r\nabc\r\nppp"

  str match {
    case r(a,b) =>
      println((a,b))
    case _ =>
      println("fail - ")
  }
}

console print fail -.

It works fine if use the Regex match abcd\r\nppp, code again:

object Regex extends App {
  val r = "(.*)\r\n(.*)".r
  val str = "abcd\r\nppp"

  str match {
    case r(a,b) =>
      println((a,b))
    case _ =>
      println("fail - ")
  }
}

Besides, I don't want replace \r\n to other characters.It's waste calculate resource, because the code is used to performance sensitive stage.

Thanks

LoranceChen
  • 2,453
  • 2
  • 22
  • 48

1 Answers1

1

Dot does not match \n by default (don't ask why - there is no reason, it just doesn't), so .* fails on the second \n. You can change that by specifying a DOTALL flag to your regex. That's done by adding (?s) to the beginning of the pattern (don't ask how ?s came to stand for DOTALL ... there is a lot of mystery like this in regex world):

 val r = "(?s)(.*)\r\n(.*)".r
 val str = "abcd\r\nabc\r\nppp"
 str match {
    case r(a,b) => println(a -> b)
 }

This prints (abcd abc, ppp )

If you want to split at the first \r\n rather than the last one add ? to the the first group:

val r = "(?s)(.*?)\r\n(.*)".r

This makes wildcard non-greedy, so that it'll match the shortest possible string, rather than the longest, which is the default.

Dima
  • 39,570
  • 6
  • 44
  • 70