Whilst you can't use a regex to parse general HTML, you can probably get away with it in this case. In Groovy, you can use (?s)
operator to make the dot match newlines. You should also probably use the (?i)
operator to make your regex case-insensitive. You can combine these as (?is)
.
For example
def titleTagWithNoLineBreaks = "<title>This is a title</title>"
def titleTagWithLineBreaks = """<title>This is
a title</title>"""
// Note the (?is) at the beginning of the regex
// The 'i' makes the regex case-insensitive
// The 's' make the dot match newline characters
def pattern = ~/(?is)<title>(.*?)<\/title>/
def matcherWithNoLineBreaks = titleTagWithNoLineBreaks =~ pattern
def matcherWithLineBreaks = titleTagWithLineBreaks =~ pattern
assert matcherWithNoLineBreaks.size() == 1
assert matcherWithLineBreaks.size() == 1
assert matcherWithLineBreaks[0][1].replaceAll(/\n/,' ') == "This is a title"
Hope that helps.