How do I create a regex that matches all alphanumerics without a number at the beginning?
Right now I have "^[0-9][a-zA-Z0-9_]"
For example, 1ab would not match, ab1 would match, 1_bc would not match, bc_1 would match.
How do I create a regex that matches all alphanumerics without a number at the beginning?
Right now I have "^[0-9][a-zA-Z0-9_]"
For example, 1ab would not match, ab1 would match, 1_bc would not match, bc_1 would match.
There are three things wrong with what you've written.
First, to negate a character class, you put the ^
inside the brackets, not before them. ^[0-9]
means "any digit, at the start of the string"; [^0-9]
means "anything except a digit".
Second, [^0-9]
will match anything that isn't a digit, not just letters and underscores. You really want to say that the first character "is not a digit, but is a digit, letter, or underscore", right? While it isn't impossible to say that, it's a lot easier to just merge that into "is a letter or underscore".
Also, you forgot to repeat the last character set. As-is, you're matching exactly two characters, so b1
will work, but b12
will not.
So:
[a-zA-Z_][a-zA-Z0-9_]*
In others words: one letter or underscore, followed by zero or more letters, digits, or underscores.
I'm not entirely sure this is what you actually want, at least if the regex is your whole parser. For example, in foo-bar
, do you want the bar
to get matched? If so, in 123spam
, do you want the spam
to get matched? But it's what you were trying to write.
This should do it:
^[^0-9][a-zA-Z0-9_]+$
Explaination:
^
: Match beggining of line[^0-9]
: Matches one of anything but a digit[a-zA-Z0-9_]+
: Matches one or more alphanumeric character$
: Matches the end of the lineYou can use \D
for any non-digit
/^\D[a-zA-Z0-9_]+$/ Should work !
Another suggestion, try this:
\b([a-zA-Z][^\s]*)
You can use this code to iterate over the results:
reobj = re.compile(r"\b([a-zA-Z][^\s]*)")
for match in reobj.finditer(subject):
start = match.start()
end = match.end()
text = match.group()
You can use this regex:
^[a-z]\w+$
The idea of the regex is that
^[a-z] -> Have to start with a letter
\w+$ -> can contain multiple alphanumeric characters (\w is the shortcut for [A-Za-z_])
Bear in mind the regex flags i
for insensitive and m
for multiline.
The python code you can use is:
import re
p = re.compile(ur'^[a-z]\w+$', re.MULTILINE | re.IGNORECASE)
test_str = u"would match\nab1\nbc_1\n\nwould not match\n1_bc\n1ab"
re.findall(p, test_str)
this is the right answer.
^(?!^[0-9].*$).*
it matches whole parts if the line does not starts with a number.
and this one is also one another pattern does similar job:
^[^0-9]+.*