45

This seems to match the rules I have defined, but I only starting learning regex tonight, so I am wondering if it is correct.

Rules:

  • Usernames can consist of lowercase and capitals
  • Usernames can consist of alphanumeric characters
  • Usernames can consist of underscore and hyphens and spaces
  • Cannot be two underscores, two hypens or two spaces in a row
  • Cannot have a underscore, hypen or space at the start or end

Regex pattern:

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/
Cœur
  • 37,241
  • 25
  • 195
  • 267
Zim
  • 5,403
  • 7
  • 27
  • 19
  • 5
    This won't allow any non-Latin characters in usernames. If you want to be able to handle non-Latin characters you should use a built-in character class instead of explicitly defining which characters are letters and numbers. – Welbog Aug 03 '09 at 12:18
  • 2
    You can test your expression online: http://www.gskinner.com/RegExr/ – twk Aug 03 '09 at 12:24
  • `/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]*$/` is enough. No need to add `+` at the last – Ashish Gupta Aug 16 '15 at 13:45
  • I actually provided an answer to [this question](https://unix.stackexchange.com/a/435120/284004). The patterns provided are not fully accurate and can be simplified greatly. – brent saner Jul 24 '18 at 17:50
  • This will give you a good idea: https://ihateregex.io/expr/username – Geon George Jan 29 '20 at 17:27

10 Answers10

82

The specs in the question aren't very clear, so I'll just assume the string can contain only ASCII letters and digits, with hyphens, underscores and spaces as internal separators. The meat of the problem is insuring that the first and last character are not separators, and that there's never more than one separator in a row (that part seems clear, anyway). Here's the simplest way:

/^[A-Za-z0-9]+(?:[ _-][A-Za-z0-9]+)*$/

After matching one or more alphanumeric characters, if there's a separator it must be followed by one or more alphanumerics; repeat as needed.

Let's look at regexes from some of the other answers.

/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/

This is effectively the same (assuming your regex flavor supports the POSIX character-class notation), but why make the separator optional? The only reason you'd be in that part of the regex in the first place is if there's a separator or some other, invalid character.

/^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/

On the other hand, this only works because the separator is optional. After the first separator, it can only match one alphanumeric at a time. To match more, it has to keep repeating the whole group: zero separators followed by one alphanumeric, over and over. If the second [a-zA-Z0-9] were followed by a plus sign, it could find a match by a much more direct route.

/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/

This uses unbounded lookbehind, which is a very rare feature, but you can use a lookahead to the same effect:

/^(?!.*[_\s-]{2,})[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9]$/

This performs essentially a separate search for two consecutive separators, and fails the match if it finds one. The main body then only needs to make sure all the characters are alphanumerics or separators, with the first and last being alphanumerics. Since those two are required, the name must be at least two characters long.

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/

This is your own regex, and it requires the string to start and end with two alphanumeric characters, and if there are two separators within the string, there have to be exactly two alphanumerics between them. So ab, ab-cd and ab-cd-ef will match, but a, a-b and a-b-c won't.

Also, as some of the commenters have pointed out, the (_|-| ) in your regex should be [-_ ]. That part's not incorrect, but if you have a choice between an alternation and a character class, you should always go with the character class: they're more efficient as well as more readable.

Again, I'm not worried about whether "alphanumeric" is supposed to include non-ASCII characters, or the exact meaning of "space", just how to enforce a policy of non-contiguous internal separators with a regex.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • ok, and how to limit this? something like: ^([a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+){6,25}$ – Gabriel Anderson Mar 04 '14 at 01:37
  • @GabrielAnderson: You mean the whole name (including separators) must be at least 6 and at most 25 characters long? That requires a lookahead. Just add `(?=.{6,25}$)` to the beginning, right after the anchor (`^`). – Alan Moore Mar 04 '14 at 12:11
32

You regular expression can be simplified to:

/^[a-zA-Z0-9]+([_ -]?[a-zA-Z0-9])*$/

Visualized with Regexper:

Visualization of username validation regex.

As you can see a user name always has to start with an alphanumeric character. Special characters (_, , -) have to be followed by an alphanumeric character. The last character has to be an alphanumeric character.

Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
  • important tool for [regex] (https://regexper.com/#%5E%5Ba-zA-Z%5D%2B%28%3F%3A%5B_-%5D%3F%5Ba-zA-Z%5D%29*%24) – MSTdev Apr 01 '20 at 05:39
7
 ([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*

is a 0 or more repetiton of alphanum, dashspace, alphanum.

So it would match

a_aa_aa_a

but not

aaaaa

The complete regexp can't match

a_aaaaaaaaa_a for example.

Let's look back at what you want:

* Usernames can consist of lowercase and capitals or alphanumerica characters
* Usernames can consist of alphanumeric characters
* Usernames can consist of underscore and hyphens and spaces
* Cannot be two underscores, two hypens or two spaces in a row
* Cannot have a underscore, hypen or space at the start or end

The beginning is simple ... just match an alphanum, then (ingoring the two in the row rule) an (alphanum or dashspace)* and at the and an alphanum again.

To prevent the two dashspaces in a row you probably need to understand lookahead/lookbehind.

Oh, and regarding the other answer: Please download Espresso, it REALLY helps you undestand those things.

froh42
  • 5,190
  • 6
  • 30
  • 42
3

I suggest writing some unit tests to put the Regex through it's paces. This will also help a few months from now when you find a problem with the Regex and need to update it.

Tim Booker
  • 2,801
  • 1
  • 25
  • 36
3
  1. Alphanumerical isn't just [a-zA-Z0-9], it's accented, Cyrillic, Greek and other letters, which can be used in username.

  2. (_|-| ) can be replaced by [-_ ] character class

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
ymv
  • 2,123
  • 13
  • 21
  • 5
    `[_- ]` is "everything between underscore and space". You want to have the hyphen first to have it interpreted properly: `[-_ ]` – Welbog Aug 03 '09 at 12:20
1

By the looks of it, that rule wouldn't match something like "a_bc", "ab_c", "a_b" or "a_b_c".

Try: /^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/ which matches the above cases but not any combination of spaces, dashes or underscores next to each other. Eg: "_-" or " _" are not allowed.

dampkwab
  • 638
  • 6
  • 14
1

Using the POSIX character class for alphanumeric characters to make it work for accented and other foreign alphabetic characters:

/^[[:alnum:]]+([-_ ]?[[:alnum:]])*$/

More efficient (prevents captures):

/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/

These also prevent sequences of more than one space/hyphen/underscore in combination. It doesn't follow from your specification whether that is desirable, but your own regex seems to indicate this is what you want.

Lars Haugseth
  • 14,721
  • 2
  • 45
  • 49
0

Another recommendation for Expresso 3.0 here - very easy to use and build up strings with.

Daniel May
  • 8,156
  • 1
  • 33
  • 43
0

Your regex doesn't work. The hard part is the check for consecutive spaces/hyphens. You could use this one, which uses look-behind:

/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/
Philippe Leybaert
  • 168,566
  • 31
  • 210
  • 223
0

In my opinion, adding a limited scope to this model would be better

[a-zA-Z0-9]+([_ -]?[a-zA-Z0-9]){5,40}$

enter image description here

jainashish
  • 4,702
  • 5
  • 37
  • 48