35

I am using JS Animated Contact Form with this line of validation regex:

rx:{".name":{rx:/^[a-zA-Z'][a-zA-Z-' ]+[a-zA-Z']?$/,target:'input'}, other fields...

I just found out, that I can't enter name like "Müller". The regex will not accept this. What do I have to do, to allow also Umlauts?

ajtrichards
  • 29,723
  • 13
  • 94
  • 101
user1555112
  • 1,897
  • 6
  • 24
  • 43

6 Answers6

48

You should use in your regex unicode codes for characters, like \u0080. For German language, I found following table:

Zeichen     Unicode
------------------------------
Ä, ä        \u00c4, \u00e4
Ö, ö        \u00d6, \u00f6
Ü, ü        \u00dc, \u00fc
ß           \u00df

(source http://javawiki.sowas.com/doku.php?id=java:unicode)

IProblemFactory
  • 9,551
  • 8
  • 50
  • 66
26

Try using this:

/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/

I have added the unicode range \u00C0-\u017F to the start of each of the square bracket groups.

Given that /^[\u00C0-\u017FA-Za-z]+$/.test("aeiouçéüß") returns true, I expect it should work.

Credit to https://stackoverflow.com/a/11550799/940252.

Community
  • 1
  • 1
Josh Harrison
  • 5,927
  • 1
  • 30
  • 44
  • `[\u00C0-\u017Fa-zA-Z']?`$/ is kind of redundant, what are you trying to do? –  Feb 25 '14 at 17:17
  • I'm not sure as I'm not terribly hot on regex and the OP didn't specify the pattern they're hoping to match. I just worked with their original code. If you can clean it up please do! :) – Josh Harrison Feb 25 '14 at 17:21
  • I would venture to change that space to something else to capture all non-word characters like hyphens. Here's a test: https://regex101.com/r/zH5uV0/4 – Mike Kormendy Jul 24 '16 at 14:01
  • 2
    `/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/.test("ü") -> false` – Zane Hitchcox Aug 18 '19 at 04:13
10

In JS, you can use the u flag on regular expressions to enable access to a special "meta sequence", namely \p. \p is a Unicode aware lookup that has a special Letter category. This category will match German, Swedish, Scandinavian, cyrillic characters etc.

In short, use this:

/\p{Letter}/u

Props to this article by Till Sanders.

tony19
  • 125,647
  • 18
  • 229
  • 307
fredrikekelund
  • 2,007
  • 2
  • 21
  • 33
7

I came up with a combination of different ranges:

[A-Za-zÀ-ž\u0370-\u03FF\u0400-\u04FF]

But I see that it misses some letters of @SambitD proposal, refer to: https://rubular.com/r/2g00QJK4rBS8Y4

Tsunamis
  • 5,870
  • 1
  • 20
  • 22
4

I used

A-Za-z-ÁÀȦÂÄǞǍĂĀÃÅǺǼǢĆĊĈČĎḌḐḒÉÈĖÊËĚĔĒẼE̊ẸǴĠĜǦĞG̃ĢĤḤáàȧâäǟǎăāãåǻǽǣćċĉčďḍḑḓéèėêëěĕēẽe̊ẹǵġĝǧğg̃ģĥḥÍÌİÎÏǏĬĪĨỊĴĶǨĹĻĽĿḼM̂M̄ʼNŃN̂ṄN̈ŇN̄ÑŅṊÓÒȮȰÔÖȪǑŎŌÕȬŐỌǾƠíìiîïǐĭīĩịĵķǩĺļľŀḽm̂m̄ʼnńn̂ṅn̈ňn̄ñņṋóòôȯȱöȫǒŏōõȭőọǿơP̄ŔŘŖŚŜṠŠȘṢŤȚṬṰÚÙÛÜǓŬŪŨŰŮỤẂẀŴẄÝỲŶŸȲỸŹŻŽẒǮp̄ŕřŗśŝṡšşṣťțṭṱúùûüǔŭūũűůụẃẁŵẅýỳŷÿȳỹźżžẓǯßœŒçÇ

which supports almost all the chars in Europe. Source of truth

isambitd
  • 829
  • 8
  • 14
  • 8
    No sane programmer would list all characters, when there are shorthand character classes and ranges. Please, don't do that. – user1438038 Dec 17 '19 at 14:21
  • 1
    @user1438038: well, I actually would prefer listing the characters explicitly in some use cases because you can have unwanted characters in the ranges, and this way you see them all imediately - definitely good in unit tests code. isambitd: this range is missing ąęł - which means it doesn't support Polish – godfryd Jan 31 '23 at 17:02
0

The problem with the \uXXXX approach is, that it is not supported by all Regex flavours. For example Visual C++ does not support it. There, you would need to enumerate the actual letters.

I recommend to use a tool like https://www.regexbuddy.com/ that knows as many flavors as possible.