I was trying to match the URL pattern string.string.
for any number of string.
using ^([^\\W_]+.)([^\\W_]+.)$
as a first attempt, and it works for matching two consecutive patterns. But then, when I generalize it to ^([^\\W_]+.)+$
stops working and matches the wrong pattern "string.str_ing.".
Do you know what is incorrect with the second version?

- 95
- 6
-
4Please escape the `.` since it is a metacharacter. – Jun 28 '20 at 18:30
-
`\w` entails the underscore too. Also since a couple of years URLs may contain Unicode letters. – Joop Eggen Jun 28 '20 at 19:10
3 Answers
With ^([^\\W_]+.)([^\\W_]+.)$
you match any two words with restricted set of characters. Although, you have not escaped the .
, it still works as long as the first word is matched first string
, then any literal (that's what unescaped .
means) and then string
again.
In the latter one the unescaped dot (.
) is a part of the capturing group occurring at least once (since you use +
), therefore it allows any character as a divisor. In other words string.str_ing.
is understood as:
string
as the 1st wordstr
as the 2nd wording
as the 3rd word
... as long as the unescaped dot (.
) allows any divisor (both .
literally and _
).
Escape the dot to make the Regex work as intented (demo):
^([^\\W_]+\.)+$

- 40,893
- 16
- 117
- 183
You need to escape your . character, else it will match any character including _.
^([^\\W_]+\.?)+$
this can be your generalised regex

- 934
- 6
- 11
[^\W] seems a weird choice - it's matching 'not not-a-word-character'. I haven't thought it through, but that sounds like it's equivalent to \w, i.e., matching a word character.
Either way, with ^\W and \w, you're asking to match underscores - which is why it matches the string with the underscore. "Word characters" are uppercase alphabetics, lowercase alphabetics, digits, and underscore.
You probably want [a-z]+ or maybe [A-Za-z0-9]+

- 1,124
- 4
- 4
-
Nope, it doesn't. The content of the `[]` says that anything except `/` literally (`//`). The `\W` (it should be `\w` anyway) doesn't work as a shortcut for `[a-zA-Z0-9_]` since the initial two backslashes (`\\`) have own meaning and the `W`/`w` character remains unescaped. The three ones should be included to take an effect (and a lower-case `w`). – Nikolas Charalambidis Jun 28 '20 at 18:40
-
There are no slashes, only backslashes, in the given regex. I assumed the \\ is just Java source syntax for a single \. Otherwise the expression is just bizarre - [^\\W_]+ matches a string of characters except for backslash, W, and underscore. Which might well have given the result seen, but it doesn't seem like a useful parsing, and I doubt it was intended. – user13784117 Jun 28 '20 at 21:02