0

I'm trying to create a regex for a HTML5 input so a user can only insert alpha characters that may be in a name. So characters from a-z, but also including ö,ü,â,æ ... and so on whilst also allowing whitespace and hyphens .

I have played around with some pattens but nothing seems to work correctly, this is what I have so far: <input type="text" name="firstname" pattern="[a-zA-Z\x7f-\xff] " title="">

Does anyone have a quick answer for this?

Mike Sav
  • 14,805
  • 31
  • 98
  • 143

2 Answers2

0

Since the HTML5 pattern attribute uses the same regex syntax as JavaScript, there is no simple way to refer to all alphabetic characters. You would need to write a rather huge expression (and to update it as new alphabetic characters are added to Unicode). You would need to start from the Unicode character database and the definition of General Category of characters there, or rely on someone having done that for you.

However, for your practical purposes, testing for “alpha characters that may be in a name” is even more complex. There are non-alphabetic characters used in names, such as left single quotation mark (‘) in addition to normal quotation mark (’), and who knows what characters there might be? If this is about people’s real names, it is very difficult to impose restrictions that do not discriminate. If this is about user names in a system, for example, you can define the repertoire as you like, but [a-zA-Z\x7f-\xff] does not look adequate (it includes some control characters and some non-alphabetic characters and excludes many Latin letters commonly used in Europe).

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
-1

There is a very simple method to apply all you RegEx logic(that one can apply easily in English) for any Language using Unicode.

For matching a range of Unicode Characters like all Alphabets [A-Za-z] we can use

[\u0041-\u005A] where \u0041 is Hex-Code for A and \u005A is Hex Code for Z

'matchCAPS leTTer'.match(/[\u0041-\u005A]+/g)
//output ["CAPS", "TT"]

In the same way we can use other Unicode characters or their equivalent Hex-Code according to their Hexadecimal Order (eg: \u0100–\u017FF) provided by unicode.org

Try: [À-ž] as an example of Range. Modify your Range according to your requirement.

It will match all characters between À and ž.

Sample regEx would be

/[A-Za-zÀ-ž\-\s]+/

For more Ref: Latin Unicode Character

Harpreet Singh
  • 2,651
  • 21
  • 31