5

i need to validate a field for empty. But it should allow English and the Foreign languages characters(UTF-8) but not the special characters. I'm not good at Regex. So any help on this would be great...

Stranger
  • 10,332
  • 18
  • 78
  • 115

3 Answers3

6

If you want to support a wide range of languages, you'll have to work by excluding only the characters you don't want, since specifying all of the ranges you do want will be difficult.

You'll need to look at the list of Unicode blocks and or the character database to identify the blocks you want to exclude (like, for instance, U+0000 through U+001F. This Wikipedia article may also help.

Then use a regular expression with character classes to look for what you want to exclude.

For example, this will check for the U+0000 through U+001F and the U+007F characters (obviously you'll be excluding more than just these):

if (/[\u0000-\u001F\u007F]/.exec(theString)) {
    // Contains at least one invalid character
}

The [] identify a "character class" (list and/or range of characters to look for). That particular one says look for \u0000 through \u001F (inclusive) as well as \u007F.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • I think in the last line you mean "\u001F (inclusive)...." instead of "\u007F (inclusive)....." – Mr.Hunt Nov 22 '13 at 09:54
4

It would have been nice if I could say "Just do /^\w+$/.test(word)", but...

See this answer for the current state of unicode support (or rather lack of) in JavaScript regular expressions.

You can either use the library he suggests, which might be slow or enlist the help of the server for this (which might be slower).

Community
  • 1
  • 1
Emil Ivanov
  • 37,300
  • 12
  • 75
  • 90
  • You can use this library with `\p{L}` (remember to double-escape backslashes). You can also cannibalize its code for the regexp you need: `http://xregexp.com/addons/unicode/unicode-base.js` defines the ranges you need for the Letter category. It's not in the regexp format, you'd have to convert it. – Amadan Dec 13 '12 at 08:09
  • Oops, by "this library" I meant [XRegExp](http://xregexp.com/) that the linked answer links to, specifically the unicode base [addon](http://xregexp.com/plugins/) and its `Letter` (`L`) class. – Amadan Dec 13 '12 at 08:45
  • Hi Emil, i'm not good at Regex. Can you please explain how the above Regex will validate? – Stranger Dec 14 '12 at 07:23
  • 1
    It should validate, but won't. How it works is simple: the `^` and `$` around mean - match everything (by default it will match the longest it can, but not everything). The `\w` means any alphanumberic character (excluding digits later is easy) and `+` - match 1 or more of those `\w`s. – Emil Ivanov Dec 14 '12 at 08:19
0

You can test for a unicode letter like this:

str.match(/\p{L}/u)

Or for the existence of a non-letter like this:

str.match(/[^\p{L}]/u)
pguardiario
  • 53,827
  • 19
  • 119
  • 159