21

I'm learning Scala, so this is probably pretty noob-irific.

I want to have a multiline regular expression.

In Ruby it would be:

MY_REGEX = /com:Node/m

My Scala looks like:

val ScriptNode =  new Regex("""<com:Node>""")

Here's my match function:

def matchNode( value : String ) : Boolean = value match 
{
    case ScriptNode() => System.out.println( "found" + value ); true
    case _ => System.out.println("not found: " + value ) ; false
}

And I'm calling it like so:

matchNode( "<root>\n<com:Node>\n</root>" ) // doesn't work
matchNode( "<com:Node>" ) // works

I've tried:

val ScriptNode =  new Regex("""<com:Node>?m""")

And I'd really like to avoid having to use java.util.regex.Pattern. Any tips greatly appreciated.

Noel Yap
  • 18,822
  • 21
  • 92
  • 144
ed.
  • 2,696
  • 3
  • 22
  • 25

3 Answers3

45

This is a very common problem when first using Scala Regex.

When you use pattern matching in Scala, it tries to match the whole string, as if you were using "^" and "$" (and did not activate multi-line parsing, which matches \n to ^ and $).

The way to do what you want would be one of the following:

def matchNode( value : String ) : Boolean = 
  (ScriptNode findFirstIn value) match {    
    case Some(v) => println( "found" + v ); true    
    case None => println("not found: " + value ) ; false
  }

Which would find find the first instance of ScriptNode inside value, and return that instance as v (if you want the whole string, just print value). Or else:

val ScriptNode =  new Regex("""(?s).*<com:Node>.*""")
def matchNode( value : String ) : Boolean = 
  value match {    
    case ScriptNode() => println( "found" + value ); true    
    case _ => println("not found: " + value ) ; false
  }

Which would print all all value. In this example, (?s) activates dotall matching (ie, matching "." to new lines), and the .* before and after the searched-for pattern ensures it will match any string. If you wanted "v" as in the first example, you could do this:

val ScriptNode =  new Regex("""(?s).*(<com:Node>).*""")
def matchNode( value : String ) : Boolean = 
  value match {    
    case ScriptNode(v) => println( "found" + v ); true    
    case _ => println("not found: " + value ) ; false
  }
user unknown
  • 35,537
  • 11
  • 75
  • 121
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • 29
    It may not be clear to skim-readers like myself, but the inclusion of `(?s)` is key here in regard to matching against multi-line strings. See http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#DOTALL – Synesso Feb 10 '11 at 01:14
  • 5
    @Synesso your link is broken now. here is the java 7 equivalent http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL – harschware Jan 17 '15 at 00:10
  • 1
    The link by @Synesso is broken. New link: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html – László van den Hoek Feb 20 '19 at 07:40
5

Just a quick and dirty addendum: the .r method on RichString converts all strings to scala.util.matching.Regex, so you can do something like this:

"""(?s)a.*b""".r replaceAllIn ( "a\nb\nc\n", "A\nB" )

And that will return

A
B
c

I use this all the time for quick and dirty regex-scripting in the scala console.

Or in this case:

def matchNode( value : String ) : Boolean = {

    """(?s).*(<com:Node>).*""".r.findAllIn( text ) match {

       case ScriptNode(v) => System.out.println( "found" + v ); true    

       case _ => System.out.println("not found: " + value ) ; false
    }
}

Just my attempt to reduce the use of the word new in code worldwide. ;)

Tristan Juricek
  • 1,804
  • 18
  • 20
5

Just a small addition, use tried to use the (?m) (Multiline) flag (although it might not be suitable here) but here is the right way to use it:

e.g. instead of

val ScriptNode =  new Regex("""<com:Node>?m""")

use

val ScriptNode =  new Regex("""(?m)<com:Node>""")

But again the (?s) flag is more suitable in this question (adding this answer only because the title is "Scala Regex enable Multiline option")

Eran Medan
  • 44,555
  • 61
  • 184
  • 276