12

I'm trying to capture the content from a multiline regex. It doesn't match.

val text = """<p>line1 
    line2</p>"""

val regex = """(?m)<p>(.*?)</p>""".r

var result = regex.findFirstIn(text).getOrElse("")

Returns empty.

I put the m - flag for multiline but it doesn't seem to help in this case.

If I remove the line break the regex works.

I also found this but couldn't get it working.

How do I match the content between the <p> elements? I want everything between, also the line breaks.

Thanks in advance!

Community
  • 1
  • 1
User
  • 31,811
  • 40
  • 131
  • 232
  • 1
    As general advice, http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags indicates it might be better not to use a regex to parse HTML in most cases. – Martijn Jun 15 '13 at 23:07

2 Answers2

25

If you want to activate the dotall mode in scala, you must use (?s) instead of (?m)

(?s) means the dot can match newlines

(?m) means ^ and $ stand for begining and end of lines

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Yup... `(?m)` will change the behaviour of `^` and `$`. Confusing names ;) – fge Jun 15 '13 at 21:47
  • @fge confusing but m = multi, s = single line. – som-snytt Jun 16 '13 at 01:05
  • 1
    Something that took me a little while to realize, and was necessary for my application: m and s are not mutually exclusive. You can have (?ms) and it'll work like you'd expect. – Sushisource Nov 15 '16 at 18:18
6

In case it's not obvious at this point, "How do I match the content":

scala> val regex = """(?s)<p>(.*?)</p>""".r

scala> (regex findFirstMatchIn text).get group 1
res52: String = 
line1 
    line2

More idiomatically,

scala> text match { case regex(content) => content }
res0: String =
line1
    line2

scala> val embedded = s"stuff${text}morestuff"
embedded: String =
stuff<p>line1
    line2</p>morestuff

scala> val regex = """(?s)<p>(.*?)</p>""".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (?s)<p>(.*?)</p>

scala> embedded match { case regex(content) => content }
res1: String =
line1
    line2
som-snytt
  • 39,429
  • 2
  • 47
  • 129