Your regex says: "Find any one of the characters -.:alnum
, then capture any amount of any characters into the first capture group".
In the first test, it found -
for the first character, then captured mystr
in the first capture group. If any groups are in the regex, findall
returns list of found groups, not the matches, so the matched -
is not included.
Your second test found u
as one of the -.:alnum
characters (as none of qwerty
matched any), then captured and returned the rest after it, io
.
As @revo notes in comments, [....]
is a character class - matching any one character in it. In order to include a POSIX character class (like [:alnum:]
) inside it, you need two sets of brackets. Also, there is no order in a character class; the fact that you included -
inside it just means it would be one of the matched characters, not that alphanumeric characters would be matched without it. Finally, if you want to match any number of alphanumerics, you have your quantifier *
on the wrong thing.
Thus, "match -
, then any number of alphanumeric characters" would be -([[:alnum:]]*)
, except... Python does not support POSIX character classes. So you have to write your own: -([A-Za-z0-9]*)
.
However, that will not match your string because the intervening space is, as you note, not an alphanumeric character. In order to account for that, -\s*([A-Za-z0-9]*)
.