32

What does this regex mean?

^[\w*]$
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080

6 Answers6

77

Quick answer: ^[\w*]$ will match a string consisting of a single character, where that character is alphanumeric (letters, numbers) an underscore (_) or an asterisk (*).

Details:

  • The "\w" means "any word character" which usually means alphanumeric (letters, numbers, regardless of case) plus underscore (_)
  • The "^" "anchors" to the beginning of a string, and the "$" "anchors" To the end of a string, which means that, in this case, the match must start at the beginning of a string and end at the end of the string.
  • The [] means a character class, which means "match any character contained in the character class".

It is also worth mentioning that normal quoting and escaping rules for strings make it very difficult to enter regular expressions (all the backslashes would need to be escaped with additional backslashes), so in Python there is a special notation which has its own special quoting rules that allow for all of the backslashes to be interpreted properly, and that is what the "r" at the beginning is for.

Note: Normally an asterisk (*) means "0 or more of the previous thing" but in the example above, it does not have that meaning, since the asterisk is inside of the character class, so it loses its "special-ness".

For more information on regular expressions in Python, the two official references are the re module, the Regular Expression HOWTO.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
  • 6
    In Python 3, the definition of `\w` takes into account Unicode character definitions by default, so it's much wider than just `[^a-zA-Z0-9_]`; see https://docs.python.org/3/library/re.html#module-re for the gory details. – Bevan Jun 19 '17 at 01:35
  • The quick answer is **highly misleading**. `re.match(r'\w', '*') == None` – Mateen Ulhaq Nov 02 '18 at 22:36
  • @MateenUlhaq I don't know what you are trying to say, but I believe you are mistaken. OP was asking about `\w` and `*` inside of a bracket expression (`[]`). Your code sample has zero relevance to the question at-hand. `re.match(r'^[\w*]$', '*')` does in fact return a match. And thank you for the downvote. – Adam Batkin Nov 03 '18 at 23:57
  • Let me rephrase: when searching up ["python `\w`"](https://www.google.com/search?q=python+\w), this is the first SO result. The title doesn't really give any implication that it's `[\w*]` rather than `\w`. Thus, it's really easy to get the impression that `\w == [a-zA-Z_*]`. – Mateen Ulhaq Nov 04 '18 at 00:29
  • @MateenUlhaq it is not you, matched `\w` but that is not what the question or answer has, it has `[\w*]` which **does** match with `*` – user16714199 Feb 10 '22 at 15:17
  • @user16714199 I was commenting on [this answer version](https://stackoverflow.com/revisions/1576812/3) and [this incorrect question title](https://stackoverflow.com/revisions/1576789/3). The current state is much better than before. – Mateen Ulhaq Feb 11 '22 at 01:15
  • 1
    @Cbhihe am I missing something or \w is the same as [a-zA-Z0-9_] and definitely not [^a-zA-Z0-9_] as you wrote? Only for the sake of clearness for everybody who come up and reads it – massi Jul 06 '22 at 14:38
  • @massi: Yes, you are absolutely right. What I wrote is quite an unfortunate typo, being exactly the inverse of what I meant. Obviously `^[\w*]` is very different from `[^\w*]`... Since comments cannot be edited I will erase it right now. Tx much for the heads-up. – Cbhihe Jul 06 '22 at 14:56
2

\w refers to 0 or more alphanumeric characters and the underscore. the * in your case is also inside the character class, so [\w*] would match all of [a-zA-Z0-9_*] (the * is interpreted literally)

See http://www.regular-expressions.info/reference.html

To quote:

\d, \w and \s --- Shorthand character classes matching digits, word characters, and whitespace. Can be used inside and outside character classes.

Edit corrected in response to comment

Community
  • 1
  • 1
Jonathan Fingland
  • 56,385
  • 11
  • 85
  • 79
  • Not in the above regular expression. Since the `*` is within the character class, it becomes a member of the class. – Adam Batkin Oct 16 '09 at 08:28
2

As exhuma said, \w is any word-class character (alphanumeric as Jonathan clarifies).

However because it is in square brackets it will match:

  1. a single alphanumeric character OR
  2. an asterisk (*)

So the whole regular expression matches:

  • the beginning of a line (^)
  • followed by either a single alphanumeric character or an asterisk
  • followed by the end of a line ($)

so the following would match:

blah
z  <- matches this line
blah

or

blah
* <- matches this line
blah
atomice
  • 3,062
  • 17
  • 23
0

From the beginning of this line, "Any number of word characters (letter, number, underscore)" until the end of the line.

I am unsure as to why it's in square brackets, as circle brackets (e.g. "(" and ")") are correct if you want the matched text returned.

Ryan Bigg
  • 106,965
  • 23
  • 235
  • 261
0

\w is equivalent to [a-zA-Z0-9_] I don't understand the * after it or the [] around it, because \w already is a class and * in class definitions makes no sense.

fforw
  • 5,391
  • 1
  • 18
  • 17
0

As said above \w means any word. so you could use this in the context of below

view.aspx?url=[\w]

which means you can have any word as the value of the "url=" parameter

GaryDevenay
  • 2,405
  • 2
  • 19
  • 41
  • 1
    \w only matches a single character, not an entire word. You would need a quantifier like +, * or {n,m} to actually match an entire word (i.e. more than a single character) – Adam Batkin Oct 16 '09 at 08:40