2

I have to check for forbidden words in a text area when a user tries to validate. The forbidden words list is stored in the jsBlackList array, and this is part of my code so far :

var fieldValue = value;
var hasForbiddenWord = false;
for (i = 0; i < jsBlackList.length; i++) {
    var regex = new RegExp("\\b"+jsBlackList[i]+"\\b","gi");
    fieldValue = fieldValue.replace(regex, '***');
    hasForbiddenWord = hasForbiddenWord || fieldValue.match(regex);
}
value = fieldValue;

But the problem is, jsBlackList has some accented characters, while the user could write without accent (for example, jsBlackList can have "déjà", and the user has typed "deja", "déja" or "dejà").

How can I check for missing accents ?

NB about "Marked as duplicate" : the duplicate questions are about "regexp without accent to check text with accents", mine was "regexp with accent to check text with potential missing accents".

Meowcate
  • 371
  • 1
  • 4
  • 12

3 Answers3

2

You need to create a list of equivalences and in your regex OR all the equivalences:

dé|ejà|a
Konstantin Dinev
  • 34,219
  • 14
  • 75
  • 100
2

One way to accomplish this i to change Your black list a bit:

Replace all characters with accent by same alternation.

For example: "déjà" to: "d(é|e)j(à|a)"

If Your blacklist is big, than probably You want to automate this replacements, but at the end it is convenient to have black list written like this.

D. Cichowski
  • 777
  • 2
  • 7
  • 24
0

I think your best bet is to:

  • remove all accented chars in the blacklist,
  • process text to replace accented chars with their non-accented equivalent

Then you can compare without bothering for accents.

Antoine
  • 5,055
  • 11
  • 54
  • 82
  • That would be the easy way, but as you can see I have to replace the matched forbidden character to triple *. If I do as you explain, it would be harder to change only the matched word as the rest or the text needs to keep his accented characters. – Meowcate May 12 '15 at 08:07