6

There are many questions on this topic, but I'm not sure if my regex is vulnerable or not. The following regex is what I use for email validation:

/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/.test(email)

Because I'm using a * in a few places, I suspect it could be.

I'd like to be able to test any number of occurrences in my code for problems.

I'm using Node.js so this could shut down my server entirely given the single threaded nature of the event loop.

Gary
  • 909
  • 9
  • 26
  • 1
    Your regex for testing email is really poor. The best way to validate an email address is to send an email and check the return value. Please, have a look at these sites: [TLD list](https://www.iana.org/domains/root/db); [valid/invalid addresses](https://en.wikipedia.org/wiki/Email_address#Examples); [regex for RFC822 email address](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) – Toto Aug 22 '20 at 10:12
  • @Toto, that regex is intense! My new regex is simple; if the user doesn't get an email, he/she can't validate so even if it passes my checks, it's a sort of fruitless endeavor - /^.+@.+\.\w{2,3}$/.test(email) – Gary Aug 22 '20 at 17:05

2 Answers2

9

Good question. Yes, given the right input, it's vulnerable and a runaway regex is able to block the entire node process, making the service unavailable.

The basic example of a regex prone to catastrophic backtracking looks like

^(\w+)*$

a pattern which can be found multiple times in the given regex.
When the regex contains optional quantifiers and the input contains long sequences that cannot be matched in the end the JS regex engine has to backtrack a lot and burns CPU. Potentially ad infinitum if the input is long enough. (You can play with this on regex101 as well using your regex by adjusting the timeout value in the settings.)

In general,

  • avoid monstrosities,
  • use HTML5 input validation whenever possible (in the front-end),
  • use established validation libraries for common input, e.g. validator.js,
  • try to detect potentially catastrophic exponential-time regular expressions ahead of time using tools like safe-regex & vuln-regex-detector (those offer pretty much what you had in mind),
  • and know your stuff 1, 2, 3 (I think the third link explains the issue best).

More drastic approaches to mitigate catastrophic backtracking in node.js are wrapping your regex efforts in a child process or vm context and set a meaningful timeout. (In a perfect world JavaScript's RegExp constructor would have a timeout param, maybe someday.)

  1. The approach of using a child process is described here on SO.

  2. The VM context (sandboxing) approach is described here.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Thanks, I literally just posted this question as well: https://stackoverflow.com/questions/63532067/safe-regex-function – Gary Aug 22 '20 at 03:06
  • I think the only part of this post that's relevant for the question are the last two bullet points. And it doesn't really answer the title question whether the OP's specific regex is vulnerable or not. – Bergi Aug 22 '20 at 03:24
  • @Bergi it does: " Yes, given the right input, it's vulnerable and a runaway regex is able to block the entire node process, making the service unavailable" – wp78de Aug 22 '20 at 03:27
  • @wp78de That sounds much like general information, not a specific answer. Which part of the regex is problematic? What's the right input to cause catastrophic backtracking? – Bergi Aug 22 '20 at 03:31
  • @Bergi So, basically, you want me to remove everything except the last two bullet-points? – wp78de Aug 22 '20 at 03:43
  • 2
    @Bergi are you happy now? – wp78de Aug 22 '20 at 04:21
  • @wp78de, thanks ... I used safe-regex and tested my regex w/it. – Gary Aug 22 '20 at 17:08
0
const Joi = require('@hapi/joi');

function isEmail(emailAsStr) {
        const schema = Joi.object({ email: Joi.string().email() });
        const result = schema.validate({ email: emailAsStr });

        const validated = result.error ? false : true;

        if (validated) return true;
        return [false, result.error.details[0].message];
}

Here's another way to do it - use a library! :) I tested it against common catastrophic backtrack regex. The answer to my original question is to use the npm lib. safe-regex, but I thought I'd share another example of how to resolve this problem w/o regex.

Gary
  • 909
  • 9
  • 26