Regex for Umlaut

Question

I am using JS Animated Contact Form with this line of validation regex:

rx:{".name":{rx:/^[a-zA-Z'][a-zA-Z-' ]+[a-zA-Z']?$/,target:'input'}, other fields...

I just found out, that I can't enter name like "Müller". The regex will not accept this. What do I have to do, to allow also Umlauts?

you could use `\w` as 'word', but you'll have to test if that mathces the umlaut — Martijn, Feb 25 '14 at 14:55

score 48 · Accepted Answer · answered Feb 25 '14 at 14:54

48

You should use in your regex unicode codes for characters, like \u0080. For German language, I found following table:

Zeichen     Unicode
------------------------------
Ä, ä        \u00c4, \u00e4
Ö, ö        \u00d6, \u00f6
Ü, ü        \u00dc, \u00fc
ß           \u00df

(source http://javawiki.sowas.com/doku.php?id=java:unicode)

answered Feb 25 '14 at 14:54

IProblemFactory

9,551
8
50
66

32

[The holy grail](http://unicode-table.com/en). You can also write ranges, ie. `[\u00F0-\u02AF]`. – tenub Feb 25 '14 at 14:56
1

Use this to find hidden characters except German Umlaute: https://regexr.com/4pmml – ThreeCheeseHigh Nov 27 '19 at 20:24
1

in case you are using PHP use `[\x{00F0}-\x{02AF}]` – Alberto Sinigaglia Sep 09 '21 at 10:51

score 26 · Answer 2 · edited May 23 '17 at 11:54

26

Try using this:

/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/

I have added the unicode range \u00C0-\u017F to the start of each of the square bracket groups.

Given that /^[\u00C0-\u017FA-Za-z]+$/.test("aeiouçéüß") returns true, I expect it should work.

Credit to https://stackoverflow.com/a/11550799/940252.

edited May 23 '17 at 11:54

Community

1
1

answered Feb 25 '14 at 15:01

Josh Harrison

5,927
1
30
44

`[\u00C0-\u017Fa-zA-Z']?`$/ is kind of redundant, what are you trying to do? – Feb 25 '14 at 17:17
I'm not sure as I'm not terribly hot on regex and the OP didn't specify the pattern they're hoping to match. I just worked with their original code. If you can clean it up please do! :) – Josh Harrison Feb 25 '14 at 17:21
I would venture to change that space to something else to capture all non-word characters like hyphens. Here's a test: https://regex101.com/r/zH5uV0/4 – Mike Kormendy Jul 24 '16 at 14:01
2

`/^[\u00C0-\u017Fa-zA-Z'][\u00C0-\u017Fa-zA-Z-' ]+[\u00C0-\u017Fa-zA-Z']?$/.test("ü") -> false` – Zane Hitchcox Aug 18 '19 at 04:13

score 10 · Answer 3 · edited Feb 22 '23 at 21:14

10

In JS, you can use the u flag on regular expressions to enable access to a special "meta sequence", namely \p. \p is a Unicode aware lookup that has a special Letter category. This category will match German, Swedish, Scandinavian, cyrillic characters etc.

In short, use this:

/\p{Letter}/u

Props to this article by Till Sanders.

edited Feb 22 '23 at 21:14

tony19

125,647
18
229
307

answered Dec 08 '21 at 10:19

fredrikekelund

2,007
2
21
33

score 7 · Answer 4 · answered Sep 02 '19 at 09:08

I came up with a combination of different ranges:

[A-Za-zÀ-ž\u0370-\u03FF\u0400-\u04FF]

But I see that it misses some letters of @SambitD proposal, refer to: https://rubular.com/r/2g00QJK4rBS8Y4

score 4 · Answer 5 · answered May 24 '19 at 13:38

4

I used

A-Za-z-ÁÀȦÂÄǞǍĂĀÃÅǺǼǢĆĊĈČĎḌḐḒÉÈĖÊËĚĔĒẼE̊ẸǴĠĜǦĞG̃ĢĤḤáàȧâäǟǎăāãåǻǽǣćċĉčďḍḑḓéèėêëěĕēẽe̊ẹǵġĝǧğg̃ģĥḥÍÌİÎÏǏĬĪĨỊĴĶǨĹĻĽĿḼM̂M̄ʼNŃN̂ṄN̈ŇN̄ÑŅṊÓÒȮȰÔÖȪǑŎŌÕȬŐỌǾƠíìiîïǐĭīĩịĵķǩĺļľŀḽm̂m̄ŉńn̂ṅn̈ňn̄ñņṋóòôȯȱöȫǒŏōõȭőọǿơP̄ŔŘŖŚŜṠŠȘṢŤȚṬṰÚÙÛÜǓŬŪŨŰŮỤẂẀŴẄÝỲŶŸȲỸŹŻŽẒǮp̄ŕřŗśŝṡšşṣťțṭṱúùûüǔŭūũűůụẃẁŵẅýỳŷÿȳỹźżžẓǯßœŒçÇ

which supports almost all the chars in Europe. Source of truth

answered May 24 '19 at 13:38

isambitd

829
8
14

8

No sane programmer would list all characters, when there are shorthand character classes and ranges. Please, don't do that. – user1438038 Dec 17 '19 at 14:21
1

@user1438038: well, I actually would prefer listing the characters explicitly in some use cases because you can have unwanted characters in the ranges, and this way you see them all imediately - definitely good in unit tests code. isambitd: this range is missing ąęł - which means it doesn't support Polish – godfryd Jan 31 '23 at 17:02

score 0 · Answer 6 · answered Mar 22 '18 at 11:49

The problem with the \uXXXX approach is, that it is not supported by all Regex flavours. For example Visual C++ does not support it. There, you would need to enumerate the actual letters.

I recommend to use a tool like https://www.regexbuddy.com/ that knows as many flavors as possible.

Regex for Umlaut

6 Answers6

Linked

Related