Helper function: How to disable forbidden words?

Question

I've been tasked to add a filtering function for forbidden usernames. I've created a huge list of these names and split everything into two parts, one containing a list of exact matches, the other with a phrase/word match list.

The exact match works as intended (e.g. "admin", "adminello" - is allowed), but the phrase/word match only works in a way if that word is found within a sentence, e.g. "sht hello". I should also make it work dashes, "sht-hello" and even combined string like "shthshthellosh*t".

Should I also divide the list, because atm it's one JSON file like:

{
"admin": 1,
"sh*t": 2
}

Helper function (badNamesList - JSON file)

 module.exports = (options = {}) => {
  return async context => {
    const argumentsList = _.get(context, 'arguments', []);
    const username = _.size(argumentsList) > 1 ? _.get(argumentsList[1], 'username') : null;
    if (username) {
      const exactMatchList = _.map(badNamesList, (name, key) => {
        if (name === 1) return key;
      });
      const phraseMatchList = _.map(badNamesList, (name, key) => {
        if (name === 2) return key;
      });
      if (_.includes(exactMatchList, _.toLower(username))) {
        throw new errors.BadRequest({ fieldErrors: { username: 'forbiddenUsername' } });
      }
      if (_.some(phraseMatchList, name => _.includes(_.map(_.split(username, /-| /), _.toLower), name))) {
        throw new errors.BadRequest({ fieldErrors: { username: 'forbiddenUsername' } });
      }
    }
    return context;
  };
};

Checkout my answer on this similar question: https://stackoverflow.com/a/56491415/2784493 - this might help you. — I am L, Sep 19 '19 at 07:22
While it's a nice list of forbidden names, it still allows things like 'shltshlt' or 'dlckshlt', etc. Adding those in manually is a waste of time — Alex, Sep 19 '19 at 07:33
well at least you don't have to think of the other stuff like `assH0le`(with a zero) and other common bad words, unless you want to manually add ALL of them on your list, which I think is a more waste of time. — I am L, Sep 19 '19 at 07:40

score 0 · Answer 1 · answered Sep 19 '19 at 07:54

Any ideas how to do this properly?

You could use the Levenshtein distance algorithm.

So you can:

transform the username to lowercase, changing numbers to letter (eg: 0=o) and remove duplicated string (eg: shitshit)
find the distance of the username with a list of banned words
if the distance equals or too much near, block it

In this case, you could have false-positive like: shot that has distance 1 to shit but you don't need to create an "up to infinity" dictionary

Helper function: How to disable forbidden words?

1 Answers1