0

In the GWT tutorial where you build a stock watcher there is this regex expression to check if an input is valid:

if (!symbol.matches("^[0-9A-Z\\.]{1,10}$"))

Which allows inputs between 1 and 10 chars that are numbers, letters, or dots.

The part that confuses me is the \\.

I interpret this as escaped backslash \\ and then a . which stands for any character. And I thought the correct expression would be \. to escape the dot but doing this results in a regex error in eclipse Invalid escape sequence.

Am I missing the obvious here?

Philipp
  • 2,376
  • 5
  • 29
  • 47
  • 2
    It's a string literal - you want a backslash in the actual string, so you need to escape that for a normal Java string literal. Ignore the regex aspect: just `String x = "\.";` isn't valid Java code. – Jon Skeet Nov 14 '16 at 11:26
  • At least related: http://stackoverflow.com/questions/18503280/how-to-represent-backslash – T.J. Crowder Nov 14 '16 at 11:27
  • Oh... OK, kind of obvious. Thanks! – Philipp Nov 14 '16 at 11:28
  • 1
    You do not need to escape the dot in the character class at all. Escape it or not, it will match a literal dot. There is just no escape sequence as `\.`. And this thread is about the same issue: [`Java doesn't work with regex \s, says: invalid escape sequence`](http://stackoverflow.com/questions/2733255/java-doesnt-work-with-regex-s-says-invalid-escape-sequence). – Wiktor Stribiżew Nov 14 '16 at 11:28
  • `\\.` is **not** an escaped backslash followed by a colon. As the others have mentioned, Java needs to escape backslash in Strings, so this is equivalent to `\.` as Regex. If you want to have an escaped backslash in Regex, you'd have to write it like this: `\\\\.` where each `\\ ` represents one backslash in the Regex. – QBrute Nov 14 '16 at 12:12

2 Answers2

4

This is one of the hassles of regular expressions in Java. That \\ is not an escaped backslash at the regex level, just at the string level.

This string:

"^[0-9A-Z\\.]{1,10}$"

Defines this regular expression:

^[0-9A-Z\.]{1,10}$

...because the escape is consumed by the string literal.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
1

\ is the escape symbol in a Java String Literal. For instance the newline character is written as \n. In order to place a normal \ in a Java string, this is done by using \\.

So your Java String literal (string in the code): "^[0-9A-Z\\.]{1,10}$" is the actual string used for the regular expression "^[0-9A-Z\.]{1,10}$" (with a single slash). So as you expected this is \. in the regular expression.

Thirler
  • 20,239
  • 14
  • 63
  • 92