0

I thought \w should match only letters(in many different languages) and numbers. I use regex to search songs on my disk. pattern "song\W+artist" will match "song - artist", "'song' artist", "song(artist)" etc. But it won't match "song _ artist", because "_" considered as a letter. Why? btw I can use pattern "song[^A-Za-z]+artist" but this will work only with latin symbols.

LazyBum Q
  • 257
  • 4
  • 12
  • 1
    No flag, there are a lot of ways to fix it (`[^\W_]`, `[\w&&[^_]]`, `[\w-[_]]`, `(?!_)\w`). `_` is used in variable names, and `\w` was meant to match all chars in a variable. – Wiktor Stribiżew Oct 08 '20 at 20:04
  • Then there is the ascii way `[a-zA-Z0-9]` a faster approach. The word _word_ is just an unfortunate phrase really. –  Oct 08 '20 at 20:58
  • If you truly have a Unicode regex engine, then there is more symbols like `_` that are contained in the word. By excluding just the underscore, you don't exclude those other chars. –  Oct 08 '20 at 21:02
  • If it is a Unicode engine, use something like `\p{alnum}` where the property matches about 133,000 characters. –  Oct 08 '20 at 21:09
  • It depends on the engine you're using but Perl and PCRE using this regex `[\p{alnum}](?<!\w)` matching in the entire set of Unicode characters, produce 750 some characters like marks and vowel signs. –  Oct 08 '20 at 21:33

0 Answers0