Regular expression for alphanumeric and underscores

Question

Is there a regular expression which checks if a string contains only upper and lowercase letters, numbers, and underscores?

It is a pity that different regex engines have different means to match alphanumerics. A question like this (rather vague, with no language/regex flavor indicated) requires a very long, or at least a very organized answer dwelling on each flavor. — Wiktor Stribiżew, Jun 16 '16 at 12:53

score 1221 · Answer 1 · edited Oct 17 '22 at 19:54

1221

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.

As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

edited Oct 17 '22 at 19:54

Peter Mortensen

30,738
21
105
131

answered Dec 03 '08 at 04:33

Charlie

44,214
4
43
69

14

If you ever go to Germany or if you ever see just about any German text you'll see what I'm saying. – Windows programmer Dec 03 '08 at 06:42
37

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc. – Jan Goyvaerts Dec 03 '08 at 07:45
5

The original question did say "upper and lowercase letters", so it would seem that "letters" from non-Latin scripts should match. – Hakanai Oct 24 '11 at 22:24
3

`[\p{upper}\p{lower}\p{gc=Number}_]` is all you need to do this right, presuming there are no combining characters. – tchrist Jun 10 '12 at 05:09
I've seen this in many places, but it still allows the '$' character for me. All other special characters are blocked that I've tested so far. – Induster Jul 31 '12 at 19:50
I get "No ending delimiter '^' found", when I use this pattern with preg_match – Chris Harrison Feb 19 '13 at 05:14
1

It looks like preg_match requires your pattern to be enclosed with delimiters, which are normally slashes. So you would need "/^[a-zA-Z0-9_]*$/". See this question for more info: http://stackoverflow.com/questions/6445133/no-ending-delimiter-found-error. See also this page: http://forums.phpfreaks.com/topic/108241-no-ending-delimiter-found-in-what-have-i-done/ – Charlie Feb 20 '13 at 14:37
5

What's going on with all the up-votes. This is not correct. It only works for English. If you are going to make an edit, EDIT it. Don't add on an "Edit:", just make it correct. – doug65536 Oct 05 '13 at 17:45
I like how you broke down the regular expressions too – JohnMerlino Feb 09 '14 at 21:22
1

Upvote for actually breaking down and explaining the pattern! Well done! – SomeRandomDeveloper Sep 08 '14 at 15:34
@heisenberg YES. x100. I took formal languages a few years ago and this brought it all back. – jlaverde May 29 '15 at 18:07
1

what about characters like "öäüßÿ...." --> Characters in other languages, which have accents etc.? – unknown6656 Sep 11 '15 at 14:33
`+` doesn't work on some grep implementations. The lexicon is limited, be carefull. – Sandburg Aug 30 '18 at 07:44

score 459 · Answer 2 · edited Oct 17 '22 at 19:55

459

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want (unless we introduce Unicode to the mix).

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.

edited Oct 17 '22 at 19:55

Peter Mortensen

30,738
21
105
131

answered Dec 05 '08 at 05:25

kch

77,385
46
136
148

81

`\w` isn’t usually restricted to ASCII alone. – tchrist Jun 10 '12 at 05:09
52

English is not the only language in the world, so this should be the accepted answer, not the `[a-z]` and its variations. `\w` will capture non-latin characters too. Like `šēēā` or `кукареку` – Alex from Jitbit Sep 11 '17 at 18:21
2

Validated on page 318 of the O'Reilly "Mastering Regular Expressions" – guidotex Nov 16 '18 at 19:30
2

`\w` is equivalent to `[a-zA-Z0-9_]` in ECMAScript (i.e. what runs in your modern Web browser), implying both are restricted to ASCII there. – Armen Michaeli Nov 30 '20 at 21:15
1

If you're using Javascript, you may want `/\p{L}/u` (note the unicode flag). Demo: `"é".match(/\w/)` ❌, `"é".match(/\p{L}/u)` ✅ – V. Rubinetti Nov 08 '22 at 18:23

Anton · Answer 3 · 2008-12-03T08:11:30.123

49

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

To indicate the string must start with that character, then use

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

edited Dec 03 '08 at 08:11

answered Dec 03 '08 at 05:08

Anton

1,387
2
17
30

10

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc. – Jan Goyvaerts Dec 03 '08 at 07:45
They are equivalent with ECMAScript. – Armen Michaeli Nov 30 '20 at 21:16

score 47 · Answer 4 · edited Dec 29 '14 at 05:40

Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:

^[[:alnum:]_]+$

However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:

^\w+$

Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.

score 42 · Answer 5 · edited Oct 17 '22 at 20:36

42

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *:

^[A-Za-z0-9_]*$

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

edited Oct 17 '22 at 20:36

Peter Mortensen

30,738
21
105
131

answered Dec 03 '08 at 04:31

BenAlabaster

39,070
21
110
151

Well now that you mention it, I also missed a whole bunch of other French characters... – BenAlabaster Dec 03 '08 at 05:54
1

\w is the same as [\w] with less typing effort – Jan Goyvaerts Dec 03 '08 at 07:49
Yeah, you still need the + or * and the ^ and $ - \w just checks that it contains word characters, not that it *only* contains word characters... – BenAlabaster Dec 03 '08 at 14:30
oddly, this still allows the $ sign. – Induster Jul 31 '12 at 19:51
@Induster, it's because of what BenAlabaster just pointed out – Sebas Apr 09 '16 at 02:02

score 24 · Answer 6 · edited Oct 17 '22 at 20:32

24

Use

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).

edited Oct 17 '22 at 20:32

Peter Mortensen

30,738
21
105
131

answered Dec 03 '08 at 04:31

Drew Hall

28,429
12
61
81

1

When dealing with languages like Portuguese, better use the ```^\w+$``` to match letters with accent. – fellyp.santos Dec 16 '22 at 20:35

score 22 · Answer 7 · edited Oct 17 '22 at 20:00

22

In computer science, an alphanumeric value often means the first character is not a number, but it is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).

Here is how you would do that:

Tested under PHP:

$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'

Or take

^[A-Za-z_][A-Za-z\d_]*$

and place it in your development language.

edited Oct 17 '22 at 20:00

Peter Mortensen

30,738
21
105
131

answered Jan 31 '12 at 13:38

Danuel O'Neal

304
3
4

score 15 · Answer 8 · edited Oct 17 '22 at 19:56

15

Use lookaheads to do the "at least one" stuff. Trust me, it's much easier.

Here's an example that would require 1-10 characters, containing at least one digit and one letter:

^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$

Note: I could have used \w, but then ECMA/Unicode considerations come into play, increasing the character coverage of the \w "word character".

edited Oct 17 '22 at 19:56

Peter Mortensen

30,738
21
105
131

answered Nov 12 '10 at 18:20

boooloooo

159
1
3

How would we do if we want to add _ and - to the list? – Rahi Sep 30 '15 at 11:26

score 13 · Answer 9 · edited Oct 17 '22 at 20:07

This works for me. I found this in the O'Reilly's "Mastering Regular Expressions":

/^\w+$/

Explanation:

^ asserts position at start of the string
- \w+ matches any word character (equal to [a-zA-Z0-9_])
- "+" Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string

Verify yourself:

const regex = /^\w+$/;
const str = `nut_cracker_12`;
let m;

if ((m = regex.exec(str)) !== null) {
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

score 10 · Answer 10 · edited Oct 17 '22 at 20:28

Try these multi-lingual extensions I have made for string.

IsAlphaNumeric - The string must contain at least one alpha (letter in Unicode range, specified in charSet) and at least one number (specified in numSet). Also, the string should consist only of alpha and numbers.

IsAlpha - The string should contain at least one alpha (in the language charSet specified) and consist only of alpha.

IsNumeric - The string should contain at least one number (in the language numSet specified) and consist only of numbers.

The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on Unicode Chart.

API:

public static bool IsAlphaNumeric(this string stringToTest)
{
    // English
    const string charSet = "a-zA-Z";
    const string numSet = @"0-9";

    // Greek
    //const string charSet = @"\u0388-\u03EF";
    //const string numSet = @"0-9";

    // Bengali
    //const string charSet = @"\u0985-\u09E3";
    //const string numSet = @"\u09E6-\u09EF";

    // Hindi
    //const string charSet = @"\u0905-\u0963";
    //const string numSet = @"\u0966-\u096F";

    return Regex.Match(stringToTest, @"^(?=[" + numSet + @"]*?[" + charSet + @"]+)(?=[" + charSet + @"]*?[" + numSet + @"]+)[" + charSet + numSet +@"]+$").Success;
}

public static bool IsNumeric(this string stringToTest)
{
    //English
    const string numSet = @"0-9";

    //Hindi
    //const string numSet = @"\u0966-\u096F";

    return Regex.Match(stringToTest, @"^[" + numSet + @"]+$").Success;
}

public static bool IsAlpha(this string stringToTest)
{
    //English
    const string charSet = "a-zA-Z";

    return Regex.Match(stringToTest, @"^[" + charSet + @"]+$").Success;
}

Usage:

// English
string test = "AASD121asf";

// Greek
//string test = "Ϡϛβ123";

// Bengali
//string test = "শর৩৮";

// Hindi
//string test = @"क़लम३७ख़";

bool isAlphaNum = test.IsAlphaNumeric();

@Shah : I have added the only alphabets (and only numbers too). — Shantanu, Apr 20 '12 at 03:27

score 9 · Answer 11 · edited Oct 17 '22 at 19:52

9

The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# Check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}

edited Oct 17 '22 at 19:52

Peter Mortensen

30,738
21
105
131

answered Dec 03 '08 at 04:31

Jay

41,768
14
66
83

The pattern in your code is correct, but the pattern above only checks a single instance. – BenAlabaster Dec 03 '08 at 04:35
That was intentional, code sample was intended as a clarifying usage in actually checking a string. Also why code has the beginning and end of line markers as well which are not in the regex example. – Jay Dec 03 '08 at 04:46
@Windows programmer - not sure if you're just trying to be humorous or clever, but alphanumeric specifically refers to the latin alphabet and arabic numerals, so wouldn't include ñ or any of the other special chars you've referenced in the comments here. – Jay Dec 03 '08 at 05:04
@Jay: I think your answer would be a lot clearer if the regex above the source code snippet was the proper regex, rather than a partial regex. People who don't know Perl will look at your regex, but not at the Perl snippet. – Jan Goyvaerts Dec 03 '08 at 07:48
1

@Windows programmer - http://en.wikipedia.org/wiki/Alphanumeric - latin *alphabet*, not "latin character set" which is what includes diacritics etc. Purely a semantics issue, but I personally go with the common usage of the term alphanumeric as A-Z and 0-9. – Jay Dec 05 '08 at 04:55
@Jan - added the full regex anyway, though there's already an accepted answer so it probably doesn't matter. Helps if people specify the language they're working in in the first place so we don't have to guess ;) – Jay Dec 05 '08 at 04:56
2

ñ is a letter of the alphabet in Spanish, including in Latin America. – Windows programmer Dec 05 '08 at 05:57
2

"I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores" doesn't limit it to Latin letters. "The following regex matches alphanumeric characters and underscore" doesn't limit it to Latin letters. "^[a-zA-Z0-9_]+$" fails. – Windows programmer Dec 05 '08 at 06:02

score 6 · Answer 12 · edited Oct 17 '22 at 20:06

6

This should work in most of the cases.

/^[\d]*[a-z_][a-z\d_]*$/gi

And by most I mean,

abcd       True
abcd12     True
ab12cd     True
12abcd     True

1234       False

Explanation

^ ... $ - match the pattern starting and ending with
[\d]* - match zero or more digits
[a-z_] - match an alphabet or underscore
[a-z\d_]* - match an alphabet or digit or underscore
/gi - match globally across the string and case-insensitive

edited Oct 17 '22 at 20:06

Peter Mortensen

30,738
21
105
131

answered Dec 24 '19 at 06:06

Chinmaya Pati

357
3
7

3

The original question didn't have a requirement that the letter shall be present. – Dmitry Kuzminov Dec 24 '19 at 06:30
Which letter are you talking about? My regex contains the one asked in the question. Alphabets, numbers, underscore – Chinmaya Pati Dec 25 '19 at 13:18
1

the `1234` is the word from the language requested by author. Your language is more restrictive. – Dmitry Kuzminov Dec 25 '19 at 22:06

score 5 · Answer 13 · edited Oct 17 '22 at 20:03

5

For those of you looking for unicode alphanumeric matching, you might want to do something like:

^[\p{L} \p{Nd}_]+$

Further reading is at Unicode Regular Expressions (Unicode Consortium) and at Unicode Regular Expressions (Regular-Expressions.info).

edited Oct 17 '22 at 20:03

Peter Mortensen

30,738
21
105
131

answered Apr 03 '12 at 14:57

Agustin

1,254
13
10

If you just want Latin do p{Latin} instead of p{L} – Agustin Apr 04 '12 at 02:38

score 4 · Answer 14 · edited Jan 11 '12 at 01:51

4

For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :

^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

edited Jan 11 '12 at 01:51

Alan Moore

73,866
12
100
156

answered Jun 24 '10 at 09:25

mylesmckeown

51
1

Exactly what I want... Thanks – Aniket kale Dec 24 '18 at 09:45

score 4 · Answer 15 · edited Aug 21 '19 at 09:39

4

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

edited Aug 21 '19 at 09:39

barbsan

3,418
11
21
28

answered Dec 03 '08 at 04:44

mson

7,762
6
40
70

score 3 · Answer 16 · edited Oct 17 '22 at 20:10

3

I believe you are not taking Latin and Unicode characters in your matches.

For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.

You can, alternatively, use this approach:

^[A-ZÀ-Ýa-zà-ý0-9_]+$

edited Oct 17 '22 at 20:10

Peter Mortensen

30,738
21
105
131

answered Feb 08 '19 at 14:08

Marcio Martins

320
4
11

score 2 · Answer 17 · edited Oct 17 '22 at 20:03

2

^\w*$ will work for the below combinations:

1
123
1av
pRo
av1

edited Oct 17 '22 at 20:03

Peter Mortensen

30,738
21
105
131

answered Nov 14 '17 at 15:50

Mukund Thakkar

1,225
14
19

What about an empty line. Is it also an alphanumeric string? – v010dya Mar 17 '20 at 18:21

score 2 · Answer 18 · edited Oct 17 '22 at 20:09

For Java, only case insensitive alphanumeric and underscore are allowed.

^ Matches the string starting with any characters
[a-zA-Z0-9_]+ Matches alpha-numeric character and underscore.

$ Matches the string ending with zero or more characters.

  public class RegExTest {
      public static void main(String[] args) {
          System.out.println("_C#".matches("^[a-zA-Z0-9_]+$"));
      }
  }

score 1 · Answer 19 · edited Oct 17 '22 at 20:03

1

This works for me. You can try:

[\\p{Alnum}_]

edited Oct 17 '22 at 20:03

Peter Mortensen

30,738
21
105
131

answered May 20 '15 at 13:02

Saurabh

7,525
4
45
46

I try this and I get unknown property Alnum, where is this defined? – Chuck Savage Jan 15 '22 at 17:37

score 1 · Answer 20 · answered Dec 03 '08 at 04:33

1

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

answered Dec 03 '08 at 04:33

David Norman

19,396
12
64
54

score -1 · Answer 21 · edited Oct 17 '22 at 20:08

Required Format

Allow these three:

0142171547295
014-2171547295
123abc

Don't allow other formats:

validatePnrAndTicketNumber(){
    let alphaNumericRegex=/^[a-zA-Z0-9]*$/;
    let numericRegex=/^[0-9]*$/;
    let numericdashRegex=/^(([1-9]{3})\-?([0-9]{10}))$/;
   this.currBookingRefValue = this.requestForm.controls["bookingReference"].value;
   if(this.currBookingRefValue.length == 14 && this.currBookingRefValue.match(numericdashRegex)){
     this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==6 && this.currBookingRefValue.match(alphaNumericRegex)){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==13 && this.currBookingRefValue.match(numericRegex) ){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else{
    this.requestForm.controls["bookingReference"].setErrors({'pattern': true});
   }
}

<input name="booking_reference" type="text" [class.input-not-empty]="bookingRef.value"
    class="glyph-input form-control floating-label-input" id="bookings_bookingReference"
    value="" maxlength="14" aria-required="true" role="textbox" #bookingRef
    formControlName="bookingReference" (focus)="resetMessageField()" (blur)="validatePnrAndTicketNumber()"/>

Regular expression for alphanumeric and underscores

21 Answers21

Explanation

Required Format

Linked

Related