Regexp with accented characters on match

Question

I have to check for forbidden words in a text area when a user tries to validate. The forbidden words list is stored in the jsBlackList array, and this is part of my code so far :

var fieldValue = value;
var hasForbiddenWord = false;
for (i = 0; i < jsBlackList.length; i++) {
    var regex = new RegExp("\\b"+jsBlackList[i]+"\\b","gi");
    fieldValue = fieldValue.replace(regex, '***');
    hasForbiddenWord = hasForbiddenWord || fieldValue.match(regex);
}
value = fieldValue;

But the problem is, jsBlackList has some accented characters, while the user could write without accent (for example, jsBlackList can have "déjà", and the user has typed "deja", "déja" or "dejà").

How can I check for missing accents ?

NB about "Marked as duplicate" : the duplicate questions are about "regexp without accent to check text with accents", mine was "regexp with accent to check text with potential missing accents".

you can try this: `d[ée]j[àa]` – Tim.Tang May 12 '15 at 07:46 — Tim.Tang, May 12 '15 at 07:46

score 2 · Answer 1 · answered May 12 '15 at 07:45

2

You need to create a list of equivalences and in your regex OR all the equivalences:

dé|ejà|a

answered May 12 '15 at 07:45

Konstantin Dinev

34,219
14
75
100

score 2 · Accepted Answer · answered May 12 '15 at 07:48

2

One way to accomplish this i to change Your black list a bit:

Replace all characters with accent by same alternation.

For example: "déjà" to: "d(é|e)j(à|a)"

If Your blacklist is big, than probably You want to automate this replacements, but at the end it is convenient to have black list written like this.

answered May 12 '15 at 07:48

D. Cichowski

777
2
7
24

4

character classes are more suited to this - d[ée]j[àa] – CupawnTae May 12 '15 at 07:49
In the end, this is what I used. Had to redo the blacklist, but I suppose it's the best solution. – Meowcate May 12 '15 at 09:27
Character classes are good as well. Readability preferences will probably decide. – D. Cichowski May 12 '15 at 10:16

score 0 · Answer 3 · answered May 12 '15 at 07:44

0

I think your best bet is to:

remove all accented chars in the blacklist,
process text to replace accented chars with their non-accented equivalent

Then you can compare without bothering for accents.

answered May 12 '15 at 07:44

Antoine

5,055
11
54
82

That would be the easy way, but as you can see I have to replace the matched forbidden character to triple *. If I do as you explain, it would be harder to change only the matched word as the rest or the text needs to keep his accented characters. – Meowcate May 12 '15 at 08:07

Regexp with accented characters on match

3 Answers3