0

I've been tasked to add a filtering function for forbidden usernames. I've created a huge list of these names and split everything into two parts, one containing a list of exact matches, the other with a phrase/word match list.

The exact match works as intended (e.g. "admin", "adminello" - is allowed), but the phrase/word match only works in a way if that word is found within a sentence, e.g. "sht hello". I should also make it work dashes, "sht-hello" and even combined string like "shthshthellosh*t".

Should I also divide the list, because atm it's one JSON file like:

{
"admin": 1,
"sh*t": 2
}

Helper function (badNamesList - JSON file)

 module.exports = (options = {}) => {
  return async context => {
    const argumentsList = _.get(context, 'arguments', []);
    const username = _.size(argumentsList) > 1 ? _.get(argumentsList[1], 'username') : null;
    if (username) {
      const exactMatchList = _.map(badNamesList, (name, key) => {
        if (name === 1) return key;
      });
      const phraseMatchList = _.map(badNamesList, (name, key) => {
        if (name === 2) return key;
      });
      if (_.includes(exactMatchList, _.toLower(username))) {
        throw new errors.BadRequest({ fieldErrors: { username: 'forbiddenUsername' } });
      }
      if (_.some(phraseMatchList, name => _.includes(_.map(_.split(username, /-| /), _.toLower), name))) {
        throw new errors.BadRequest({ fieldErrors: { username: 'forbiddenUsername' } });
      }
    }
    return context;
  };
};
Dharman
  • 30,962
  • 25
  • 85
  • 135
Alex
  • 1,210
  • 3
  • 21
  • 34
  • Checkout my answer on this similar question: https://stackoverflow.com/a/56491415/2784493 - this might help you. – I am L Sep 19 '19 at 07:22
  • While it's a nice list of forbidden names, it still allows things like 'shltshlt' or 'dlckshlt', etc. Adding those in manually is a waste of time – Alex Sep 19 '19 at 07:33
  • well at least you don't have to think of the other stuff like `assH0le`(with a zero) and other common bad words, unless you want to manually add ALL of them on your list, which I think is a more waste of time. – I am L Sep 19 '19 at 07:40

1 Answers1

0

Any ideas how to do this properly?

You could use the Levenshtein distance algorithm.

So you can:

  • transform the username to lowercase, changing numbers to letter (eg: 0=o) and remove duplicated string (eg: shitshit)
  • find the distance of the username with a list of banned words
  • if the distance equals or too much near, block it

In this case, you could have false-positive like: shot that has distance 1 to shit but you don't need to create an "up to infinity" dictionary

Manuel Spigolon
  • 11,003
  • 5
  • 50
  • 73