5

My scala version 2.7.7

Im trying to extract an email adress from a larger string. the string itself follows no format. the code i've got:

import scala.util.matching.Regex
import scala.util.matching._
val Reg = """\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
"yo my name is joe : joe@gmail.com" match {
    case Reg(e) => println("match: " + e)
    case _ => println("fail")
}

the Regex passes in RegExBuilder but does not pass for scala. Also if there is another way to do this without regex that would be fine also. Thanks!

skaffman
  • 398,947
  • 96
  • 818
  • 769
Luigimax
  • 546
  • 1
  • 5
  • 12

3 Answers3

7

As Alan Moore pointed out, you need to add the (?i) to the beginning of the pattern to make it case-insensitive. Also note that using the Regex directly matches the whole string. If you want to find one within a larger string, you can call findFirstIn() or use one of the similar methods of Regex.

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : joe@gmail.com"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}
Mitch Blevins
  • 13,186
  • 3
  • 44
  • 32
  • Not the solution i went with, but it works! thanks! (silly me, did not check what class was being returned : would have found Some(x)) – Luigimax May 17 '10 at 05:04
3

It looks like you're trying to do a case-insensitive search, but you aren't specifying that anywhere. Try adding (?i) to the beginning of the regex:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

Well, the ways to do it other than REs are probably a lot messier. The next step up would probably the a combinator parser. A lot of random string dissection code would be even more general and almost certainly a whole lot more painful. In part what's a suitable tactic depends on how complete (and how strict or lenient) your recognizer needs to be. E.g., the common form: Rudolf Reindeer <rudy.caribou@north_pole.rth> is not accepted by your RE (even after the case-sensitivity is relaxed). Full-blown RFC 2822 address parsing is rather challenging for an RE-based approach.

Randall Schulz
  • 26,420
  • 4
  • 61
  • 81
  • Did someone say RFC 2822? (There are two breaks in the stuff. For this to work, they should be single ticks but escaping is hard.) `(?:[a-z0-9!#$%&'*+/=?^_``{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_``{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])` – Mike May 18 '10 at 00:13