I thought \w should match only letters(in many different languages) and numbers. I use regex to search songs on my disk. pattern "song\W+artist" will match "song - artist", "'song' artist", "song(artist)" etc. But it won't match "song _ artist", because "_" considered as a letter. Why? btw I can use pattern "song[^A-Za-z]+artist" but this will work only with latin symbols.
Asked
Active
Viewed 250 times
0
-
1No flag, there are a lot of ways to fix it (`[^\W_]`, `[\w&&[^_]]`, `[\w-[_]]`, `(?!_)\w`). `_` is used in variable names, and `\w` was meant to match all chars in a variable. – Wiktor Stribiżew Oct 08 '20 at 20:04
-
Then there is the ascii way `[a-zA-Z0-9]` a faster approach. The word _word_ is just an unfortunate phrase really. – Oct 08 '20 at 20:58
-
If you truly have a Unicode regex engine, then there is more symbols like `_` that are contained in the word. By excluding just the underscore, you don't exclude those other chars. – Oct 08 '20 at 21:02
-
If it is a Unicode engine, use something like `\p{alnum}` where the property matches about 133,000 characters. – Oct 08 '20 at 21:09
-
It depends on the engine you're using but Perl and PCRE using this regex `[\p{alnum}](?<!\w)` matching in the entire set of Unicode characters, produce 750 some characters like marks and vowel signs. – Oct 08 '20 at 21:33