20

I'm looking to implement the solution provided in this answer but it's not working. The code in this jsFiddle have looks like this:

function Start() {

    $('#TheBox').keyup(function () {

        var TheInput = $('#TheBox').val();      
        var TheCleanInput = TheInput.replace(/([.\p{L}])/g, '');

        $('#TheBox').val(TheCleanInput);
    });
}

$(Start);

Basically, I'm looking to allow letters such as é è ô as well as numbers. What do I need to change to make the regex filter work?

halfer
  • 19,824
  • 17
  • 99
  • 186
frenchie
  • 51,731
  • 109
  • 304
  • 510
  • 5
    Javascript regex has no unicode character class. So you can forget `\p{L}`. You can always build a character class with an explicit list of accentued characters. – Casimir et Hippolyte May 13 '15 at 21:34
  • 1
    See http://stackoverflow.com/questions/17357716/javascript-regex-unicode-diacritic-combining-characters – James May 13 '15 at 21:39
  • @James: I saw it but it doesn't provide an answer. I looked at this question but I haven't been able to make it work either: http://stackoverflow.com/questions/19001140/amend-regular-expression-to-allow-german-umlauts-french-accents-and-other-valid – frenchie May 13 '15 at 21:46
  • @frenchie: I think James' answer is basically the correct one: "You can't do that in Javascript." You're either going to have to do some AJAX to hit a server-side script that has real Unicode support or make a hacky best effort in JS. – Conspicuous Compiler May 13 '15 at 22:42
  • @ConspicuousCompiler: Actually, can you try taking diatrics from french.typeit.org as well as other languages and then use the regex in the answer with the jsfddle and see where it fails? It all seems to actually work. http://jsfiddle.net/cv1o0ywc/2/ – frenchie May 14 '15 at 15:12
  • I would have up-voted this question except for dumping 70KB of jQuery when clients *already support JavaScript*. – John Sep 05 '17 at 01:14

4 Answers4

40

As Casimir et Hippolyte stated in comments, Javascript does not support \p{L} unicode character class.

You can create your own character class:

[a-zA-Z0-9À-ž]

Demo

If you want to allow those characters but replace characters outside those ranges, negate the character classes:

[^a-zA-Z0-9À-ž]

Demo

Or as pointed out in comments:

[A-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]
dawg
  • 98,345
  • 23
  • 131
  • 206
  • I'm probably missing something obvious here: http://jsfiddle.net/cv1o0ywc/1/ Can you look at it? – frenchie May 13 '15 at 21:59
  • You want to replace letters like `a b è` with nothing or something other than those letters? You can just negate it: `[^a-zA-Z0-9À-ž]` – dawg May 13 '15 at 22:08
  • Ok, this is really GREAT!!! I've updated the jsfiddle to this: http://jsfiddle.net/cv1o0ywc/2/ (added space and dash) and it works. I went to http://french.typeit.org/ to test it by copying diatrics into the fiddle and all seems fine. Thanks for the answer! – frenchie May 14 '15 at 04:34
  • Is there a case sensitive variant for these, `À-Ž` does not match uppercase letters only – idleberg Apr 21 '20 at 10:30
  • 1
    `[À-ž]` support also "×" and "÷" which are usually not wanted and does not support lots of other latin letters with diacritics or others (such as german ß). You may need to fine-tune the range to something like `[A-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]` – Radek Pech Jul 21 '20 at 09:01
  • 1
    The range mentioned by @RadekPech matches also symbols like `[` or `_` so I would recommend this adjustment `[A-Za-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]` – Lenka Vraná Jun 04 '21 at 19:04
22

The [À-ž] character class includes the following characters, highlighted in yellow below.

enter image description here

Quolonel Questions
  • 6,603
  • 2
  • 32
  • 33
  • 2
    Just wondering.. Is there a different between À-ž and à-Ž? i.e. mirrored casing? – Sander Schaeffer Apr 24 '18 at 08:21
  • 1
    Update: Alright, if you'd do it opposite way, you'll me eliminating about half of the characters. So big À and small ž is the way to go. – Sander Schaeffer Apr 24 '18 at 08:28
  • This range support also "×" and "÷" which are usually not wanted and does not support lots of other latin letters with diacritics or others (such as german ß). You may need to fine-tune the range to something like `[A-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]` – Radek Pech Jul 21 '20 at 09:00
  • 6
    The range mentioned by @RadekPech matches also symbols like `[` or `_` so I would recommend this adjustment `[A-Za-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]` – Lenka Vraná Jun 04 '21 at 18:49
0

If someone looking for only polish diacritics: [A-Za-zĄ-ćĘęÓóŁ-ńŚśŹ-ż].

FalconTech
  • 46
  • 2
-2

When using Java you can use a java regex for this:

import java.util.regex.Pattern;

public class WhitelistValidator {

    private static final String ANY_LETTER = "\\p{L}"; // includes diacritics
    private static final String NUMERIC = "0-9";
    public static final Pattern PATTERN = Pattern.compile(String.format("[%s%s]+", ANY_LETTER, NUMERIC));

    public boolean isValid(String valueUnderTest) {
        return PATTERN.matcher(valueUnderTest).matches();
    }
}

See https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

scre_www
  • 2,536
  • 4
  • 20
  • 30