7

I am trying to write a regular expression in Javascript to match a name field, where the only allowed values are letters, apostrophes and hyphens. For example, the following names should be matched:

jhon's
avat-ar
Josh

Could someone please help me construct such a regex?

Perception
  • 79,279
  • 19
  • 185
  • 195
zoom_pat277
  • 1,214
  • 5
  • 15
  • 31
  • 5
    First off, define "letters". Because there's an awful lot of them – Tomalak Apr 27 '10 at 13:58
  • letters would be 'a to z' both upper and lower cases, since they are required for an input of first name and last name fields – zoom_pat277 Apr 27 '10 at 14:00
  • possible duplicate of http://stackoverflow.com/questions/421046/what-are-all-of-the-allowable-characters-for-peoples-names – Joachim Sauer Apr 27 '10 at 14:03
  • well! you have a valid point over here, but my requirement docs just says letters, apost, and hyphons... I do not even know what are those 'u' in 'Jürgen Müller' are called... but I would be curious to know, as to what are those charachters called and how can we come up with the regular expression to match them... – zoom_pat277 Apr 27 '10 at 14:07
  • @zoom: Time to go back and clarify your requirements, I guess. ;) It baffles me every time when people find out that there are more letters in the world than in US-ASCII. – Tomalak Apr 27 '10 at 14:10
  • @zoom: "ü" is definitely a letter, it's an "Umlaut-u" in German and called "LATIN SMALL LETTER U WITH DIAERESIS" in the Unicode standard. And German is only one of many languages that uses non-ASCII charactes in its names (and other words). – Joachim Sauer Apr 27 '10 at 14:13
  • @Joachim: Thanks! That was a good piece of info. I really appreciate it and I am sure we are not doing this in other parts of our app, so we will not be implementing it here either, but we should eventually start accepting non-Ascii charachters... Thanks again! – zoom_pat277 Apr 27 '10 at 14:18

3 Answers3

13

Yes.

^[a-zA-Z'-]+$

Here,

  • ^ means start of the string, and $ means end of the string.
  • […] is a character class which anything inside it will be matched.
  • x+ means the pattern before it can be repeated once or more.

Inside the character class,

  • a-z and A-Z are the lower and upper case alphabets,
  • ' is the apostrophe, and
  • - is the hyphen. The hyphen must appear at the beginning or the end to avoid confusion with the range separator as in a-z.

Note that this class won't match international characters e.g. ä. You have to include them separately e.g.

^[-'a-zA-ZÀ-ÖØ-öø-ſ]+$
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • @Robert - no. Both beginning and end are OK for "-" – DVK Apr 27 '10 at 13:59
  • I think this works for me, but it would be great if you could give me a little info on how you constructed this... I understood the most part of it except that how the hyphon position (at start and at the end) in your regular expression would work (as DVK mentioned in the above comment) and the significance of + sign... Just for my understanding! thank you! – zoom_pat277 Apr 27 '10 at 14:11
  • @KennyTM: That is indeed helpful... Thank you so much! I should start working more on Regular expression... Thanks – zoom_pat277 Apr 27 '10 at 15:25
4

A compact version for the UTF-8 world that will match international letters and numbers.

/^[\p{L}\p{N}*-]+$/u

Explanation:

  • [] => character class definition
  • p{L} => matches any kind of letter character from any language
  • p{N} => matches any kind of numeric character
  • *- => matches asterisk and hyphen
  • + => Quantifier — Matches between one to unlimited times (greedy)
  • /u => Unicode modifier. Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters.

Note, that if the hyphen is the last character in the class definition it does not need to be escaped. If the dash appears elsewhere in the class definition it needs to be escaped, as it will be seen as a range character rather then a hyphen.

Epiphany
  • 1,886
  • 1
  • 20
  • 14
  • Nice solution for other languages, but it also matches numbers. It will match both "John091" and even "123". Tested in C# – Artemious Dec 20 '17 at 07:42
  • It throws an exception in javascript: Uncaught SyntaxError: Invalid regular expression: /^[\p{L}\p{N}*-]+$/: Invalid escape – Artemious Dec 20 '17 at 07:43
0

More compact version is [\w'-]+

camster
  • 646
  • 7
  • 10
  • This is not for a name, because it also matches numbers. It will match both "John0'9-1" and even "1-23" – Artemious Dec 20 '17 at 07:38
  • 2
    @Artemious: this answer would indeed pass `[0-9_]` which the question disallows. But note: [Falsehoods Programmers Believe About Names, № 15: People's names do not contain numbers](https://shinesolutions.com/2018/01/08/falsehoods-programmers-believe-about-names-with-examples/) Example: [Jennifer 8 Lee](https://en.wikipedia.org/wiki/Jennifer_8._Lee). Also, don't forget `'` in `O'Brian` or [d̶o̶u̶b̶l̶e̶ , ̶t̶r̶i̶p̶l̶e̶ , quadruple-barrelled names](https://en.wikipedia.org/wiki/Double-barrelled_name) Example: [John Graham-Cumming](http://blog.jgc.org/2010/06/your-last-name-contains-invalid.html) – GitaarLAB Jun 16 '18 at 07:58
  • You can definitely use numbers: https://www.quora.com/Can-human-names-contain-numbers – camster Jun 25 '20 at 20:00