4

I need to prevent my users from entering non-european characters in a text box.

For example, here's how I disallow Cyrillic:

$('.test').keyup(function(e) {
        var toTest = $(this).val();
        var rforeign = /[\u0400-\u04FF]/i;
        if (rforeign.test(toTest)) {
            alert("No cyrillic allowed");
            $(this).val('');
        } 
    });

But I also need to exclude Arabic, Japanese, and so on.

I just want to allow:

  • ASCII English, standard characters
  • Italian accented letters: à è ì ò ù á é í ó ú
  • other special characters from European languages: French, German...

Is there a way to do that with ranges?

I tried /[\u0400-\u04FF]/i but it just allows ASCII English (not Italian for example).

Clover
  • 23
  • 3
Fabio B.
  • 9,138
  • 25
  • 105
  • 177
  • 2
    I think you need to define "Europe". Cyrillic scripts are used in a handful of European countries, so where exactly are you drawing the line? Do you consider Iceland to be a part of Europe? – Tim Pietzcker Apr 17 '15 at 07:42
  • What about Arabic numerals...? There are several on this very page... – deceze Apr 17 '15 at 08:08

2 Answers2

6

Just allow unicode symbols in some given ranges, e.g.

/^[a-z\u00C0-\u00F6\u00F8-\u017E]+$/i

Example fiddle: https://jsfiddle.net/4y6e6bj5/3/


This regular expression allows basic latin / latin extended A (diacritics and accented letters). It excludes any other alphabet/symbol.

If you need to allow other specific unicode symbols, look at the unicode table and insert as many ranges as you need into the regular expression

Fabrizio Calderan
  • 120,726
  • 26
  • 164
  • 177
  • While this work perfectly, it does not detect non european characters as much as detect non-non european characters. – Docteur Apr 17 '15 at 07:37
  • this is not working... I replaced your regex in my code and still am able to type ية or Ф without any alert! – Fabio B. Apr 17 '15 at 07:39
  • ok thank you, I just needed to check the negative match, it works. what if I need to allow punctuation and white spaces too? – Fabio B. Apr 17 '15 at 07:56
  • as you can see from the unicode table I've linked punctuation is in unicode range `\u0020-\002F`. Include it in the regexp. – Fabrizio Calderan Apr 17 '15 at 07:58
  • I had XML containing a bunch of bidirectional characters (Arabic and Hebrew), as well as Tamil characters, produced in error via OCR. I found them using [\u0590-\u1000]. – Will Hanley May 11 '20 at 15:03
0

Use a negative set :

[^A-Za-zàèìòùáéíóú(othercharacters..)]
Docteur
  • 1,235
  • 18
  • 34