0

I want to match an HTML file:

If the file starts with spaces and then an end tag </sometag>, return true.

Else return false.

I used the "(\\s)*</(\\w)*>.*", but it doesn't match \n </p>\n </blockquote> ....

JackWM
  • 10,085
  • 22
  • 65
  • 92
  • @Gabe I thought it matches newlines. See here http://www.vogella.com/articles/JavaRegularExpressions/article.html. \s A whitespace character, short for [ \t\n\x0b\r\f] – JackWM Aug 27 '12 at 21:55
  • @JackWM: You're right; it's `.` that doesn't necessarily match `\n`. See the `DOTALL` mode. – Gabe Aug 27 '12 at 21:56
  • 1
    I think you should consider reading this SO answer http://stackoverflow.com/questions/4026115/regex-word-boundary-but-for-white-space-beginning-of-line-or-end-of-line-only – Sal Aug 27 '12 at 22:02

2 Answers2

1

Thanks to Gabe's help. Gabe is correct. The . doesn't match \n by default. I need to set the DOTALL mode on.

To do it, add the (?s) to the beginning of the regex, i.e. (?s)(\\s)*</(\\w)*>.*.

JackWM
  • 10,085
  • 22
  • 65
  • 92
1

You can also do this:

Pattern p = Pattern.compile("(\\s)*</(\\w)*>");
Matcher m = p.matcher(s);
return m.lookingAt();

It just checks if the string starts with the pattern, rather than checking the whole string matches the pattern.

TimK
  • 4,635
  • 2
  • 27
  • 27