2

I use two Regular Expressions to Match a Text.

var RegExp = /[^\W\d](\w|[-'](?=\w))*/gi;
var RegExpOthers = /[^\W\d]{4,}(\w|[-']{1,2}(?=\w))*/gi;

Currently no Word containing letters like ä, ü, ö...(German alphabet) will be matched.

How can I extend these Expressions?

Cœur
  • 37,241
  • 25
  • 195
  • 267
mm1975
  • 1,583
  • 4
  • 30
  • 51
  • possible duplicate of [Regex for Umlaut](http://stackoverflow.com/questions/22017723/regex-for-umlaut), and see also [Javascript + Unicode regexes](http://stackoverflow.com/questions/280712/javascript-unicode-regexes) – apsillers Sep 18 '14 at 12:35

2 Answers2

2

This should do it:

[^\x00-\x7F]+

It matches any character which is not contained in the ASCII character set (0-127, i.e. 0x0 to 0x7F). You can do the same thing with Unicode:

[^\u0000-\u007F]+
alessandrio
  • 4,282
  • 2
  • 29
  • 40
  • I´ve tried to this...If I use the Expression var RegExp = /[^\W\d]+[^\x00-\x7F](\w|[-'](?=\w))*/gi; it returns for the string "der schöne baum ist schön" only "schön" and "schöne" – mm1975 Sep 18 '14 at 13:03
  • You can add all the possible characters – alessandrio Sep 18 '14 at 13:11
  • complete list [german character](http://homepages.rootsweb.ancestry.com/~george/ansi_ascii_character_chart.html) – alessandrio Sep 18 '14 at 13:14
  • sorry, I have to ask again: If I use the RegExp mentioned in my comment, why does it not return the words 'der', 'baum' and 'ist', and only "schön" and "schöne" I think, I have an syntax error. – mm1975 Sep 18 '14 at 14:02
1

This is subject of Unicode characters.

What happens is that ä, ü, ö.. in your example is not a single letter but 2 because the tilde counts as a character as well. This brings lots of complexities and rules that needs to be followed in order to meet Unicode rules.

You could do something like: ([\x{0049}-\x{0130}]) to meet for example i with tildes but this expression may vary depending if you are going to use this expression on .net, java, javascript or php.

Online Demo

You could also check what code each character represents here:

http://www.fileformat.info/info/unicode/char/search.htm?q=%C4%B0&preview=entity

Dalorzo
  • 19,834
  • 7
  • 55
  • 102