7

How do I create a regex that matches all alphanumerics without a number at the beginning?

Right now I have "^[0-9][a-zA-Z0-9_]"

For example, 1ab would not match, ab1 would match, 1_bc would not match, bc_1 would match.

Apollo
  • 8,874
  • 32
  • 104
  • 192

7 Answers7

17

There are three things wrong with what you've written.

First, to negate a character class, you put the ^ inside the brackets, not before them. ^[0-9] means "any digit, at the start of the string"; [^0-9] means "anything except a digit".

Second, [^0-9] will match anything that isn't a digit, not just letters and underscores. You really want to say that the first character "is not a digit, but is a digit, letter, or underscore", right? While it isn't impossible to say that, it's a lot easier to just merge that into "is a letter or underscore".

Also, you forgot to repeat the last character set. As-is, you're matching exactly two characters, so b1 will work, but b12 will not.

So:

[a-zA-Z_][a-zA-Z0-9_]*

Regular expression visualization

Debuggex Demo

In others words: one letter or underscore, followed by zero or more letters, digits, or underscores.

I'm not entirely sure this is what you actually want, at least if the regex is your whole parser. For example, in foo-bar, do you want the bar to get matched? If so, in 123spam, do you want the spam to get matched? But it's what you were trying to write.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • $abarnert thanks for this answer (and +1 for the Debuggex Demo). This is exactly what I neeeded. – Apollo Oct 27 '14 at 20:38
6

This should do it:

^[^0-9][a-zA-Z0-9_]+$

Explaination:

  • ^: Match beggining of line
  • [^0-9]: Matches one of anything but a digit
  • [a-zA-Z0-9_]+: Matches one or more alphanumeric character
  • $: Matches the end of the line
Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • I'm pretty sure this isn't what he wants. After all, `-foo` doesn't have a number at the beginning, so it will match your expression, but I don't think it's what he's looking for. – abarnert Oct 27 '14 at 20:38
  • Well, it would have been better with a more complete set of test input; I'm _guessing_ he doesn't want `-foo` based on the way he phrased his description, but it would be better to _know_ that… – abarnert Oct 27 '14 at 20:41
  • @abarnert: Reading the question again, I'm pretty sure you're right. +1'd your answer. – Linuxios Oct 27 '14 at 20:43
2

You can use \D for any non-digit

/^\D[a-zA-Z0-9_]+$/ Should work !
Sven Eberth
  • 3,057
  • 12
  • 24
  • 29
0

You can use this: ^[A-Za-z_][A-Za-z0-9_]*$

FliegendeWurst
  • 176
  • 4
  • 9
Mazdak
  • 105,000
  • 18
  • 159
  • 188
0

Another suggestion, try this:

\b([a-zA-Z][^\s]*)

You can use this code to iterate over the results:

reobj = re.compile(r"\b([a-zA-Z][^\s]*)")
for match in reobj.finditer(subject):
    start = match.start()
    end = match.end()
    text = match.group()
Tommy Andersen
  • 7,165
  • 1
  • 31
  • 50
0

You can use this regex:

^[a-z]\w+$

Working demo

enter image description here

The idea of the regex is that

^[a-z]   -> Have to start with a letter
\w+$     -> can contain multiple alphanumeric characters (\w is the shortcut for [A-Za-z_])

Bear in mind the regex flags i for insensitive and m for multiline.

The python code you can use is:

import re
p = re.compile(ur'^[a-z]\w+$', re.MULTILINE | re.IGNORECASE)
test_str = u"would match\nab1\nbc_1\n\nwould not match\n1_bc\n1ab"

re.findall(p, test_str)
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
0

this is the right answer.

^(?!^[0-9].*$).*

it matches whole parts if the line does not starts with a number.

and this one is also one another pattern does similar job:

^[^0-9]+.*
Zen Of Kursat
  • 2,672
  • 1
  • 31
  • 47