13

Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?

I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...

regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+

Working example: https://regex101.com/r/kU6rS5/1

So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof

Thanks.

  • 4
    Ummmmmm... what the heck are you trying to match here? – Sebastian Lenartowicz Aug 09 '16 at 20:12
  • 1
    On regex101 it says: "The script has halted execution as it exceeded a maximum execution time of 2s. This would likely occur when your expression results in what is known as catastrophic backtracking. I have halted the execution for you and will resume it after you have modified your expression or match string." http://www.regular-expressions.info/catastrophic.html - You cannot halt a regex.match if it takes too much time, I think you need reevaluate your regular expression. – Rob M. Aug 09 '16 at 20:12
  • Hm, that's why is working in my application. But, still... is taking 3 minutes to complete.. i want cancel to avoid blocking my server... –  Aug 09 '16 at 20:14
  • The reason it's taking so long is because the back references (`\1`, which appears in several places). I'm not sure exactly what you're trying to match, but if you can find a way to extract the number from the capture group in a separate expression, then generate a second expression dynamically, using that value in place of the `\1`'s, it should run a lot faster. – Matthew Crumley Aug 09 '16 at 21:38
  • As a proof of concept, here's a forked version of the regex with the capture and back references replaced with 0057 that runs relatively quickly: https://regex101.com/r/eE5eB1/2 It's still somewhat fragile though and will time out if it fails to match anything. – Matthew Crumley Aug 09 '16 at 21:52
  • hm... i get it! @MatthewCrumley –  Aug 09 '16 at 22:04
  • 1
    The point is that the expression above is poorly written: 1) the `(?:[\s\S]*?)` must be removed because they were not even meant to be there, use a `*` quantifier on the non-capturing groups to correctly unroll the lazy dot matching patterns (you have 2 here), 2) the second unrolled pattern is meaningless, you may use `[\s\S]*?`, 3) the final subpattern (in the negative lookaheads) quantifiers should be removed for quicker matching. – Wiktor Stribiżew Aug 09 '16 at 22:22
  • 1
    Here is the correct regex that will still be slow since the input is huge and has a lot of `P` and `Z` and the pattern is long by itself: [`MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d)[^P]*)*)PÁG\s:\s+\d+\/(\d+)\b[\s\S]*?\1\/\1[^Z]*(?:Z(?!6:\s\d)[^Z]*)*Z6:\s+\d+`](https://regex101.com/r/wT1cF9/1). – Wiktor Stribiżew Aug 09 '16 at 22:24
  • You can execute synchronous code with timeout with https://nodejs.org/docs/latest/api/vm.html#vm_vm_runinnewcontext_code_contextobject_options – konrad Jun 09 '20 at 10:20

3 Answers3

7

You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.

fork is probably what you should use, if you go down this road.

If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:

index.js

const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;

regexProcess.on('message', function(data) {
  console.log('received message from child:', data);
  clearTimeout(timeout);
  received = data;
  regexProcess.kill(); // or however you want to end it. just as an example.
  // you have access to the regex data here.
  // send to a callback, or resolve a promise with the value,
  // so the original calling code can access it as well.
});

const timeoutInMs = 10000;
let timeout = setTimeout(() => {
  if (!received) {
    console.error('regexProcess is still running!');
    regexProcess.kill(); // or however you want to shut it down.
  }
}, timeoutInMs);

regexProcess.send('message to match against');

regex-process.js

function respond(data) {
  process.send(data);
}

function handleMessage(data) {
  console.log('handing message:', data);
  // run your regex calculations in here
  // then respond with the data when it's done.

  // the following is just to emulate
  // a synchronous computational delay
  for (let i = 0; i < 500000000; i++) {
    // spin!
  }
  respond('return regex process data in here');
}

process.on('message', handleMessage);

This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.

dvlsg
  • 5,378
  • 2
  • 29
  • 34
  • Hei @dvlsg, would you say this solution is bad? I need block because outside users will use it... –  Aug 09 '16 at 22:05
  • Depends on what your definition of bad is! And if I understand the problem correctly. If you know for sure that you will have a `regex.match` which may or may not take up to 10 seconds and the regex itself cannot be improved, but you want to make sure that your main process isn't getting blocked, then something similar to this is probably what you want. The child process will be blocked while the regex runs, and you may want to monitor your CPU usage on the machine, but the main process will continue to run unhindered. – dvlsg Aug 09 '16 at 22:07
  • Spawning a child process is hella expensive, but it works. Note that this adds significant non-deterministic overhead to regexes executed in this fashion. – Ivan Rubinson Jun 30 '19 at 09:34
  • Spawning a process for each regex evaluation might become extremely expensive when the number of simultaneously evaluated regex is high (think of servers that serve millions of requests per minute). – omer Jun 30 '19 at 09:39
  • Correct - you should weigh your needs and adjust appropriately. You can always leave processes alive in a process pool and send regexes to the pool as necessary if that fits your needs better. I suspect a server expected to serve millions of requests per minute isn't going to want to run regexes that can take up to 10 seconds per request, though, regardless of whether or not new processes are spun up. – dvlsg Jun 30 '19 at 19:35
3

Another solution I found here: https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/

Based on the use of VM, no process fork. That's pretty.

    const util = require('util');
    const vm = require('vm');

    var sandbox = {
        regex:/^(A+)*B/,
        string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
        result: null
    };

    var context = vm.createContext(sandbox);
    console.log('Sandbox initialized: ' + vm.isContext(sandbox));
    var script = new vm.Script('result = regex.test(string);');
    try{
        // One could argue if a RegExp hasn't processed in a given time.
        // then, its likely it will take exponential time.
        script.runInContext(context, { timeout: 1000 }); // milliseconds
    } catch(e){
        console.log('ReDos occurred',e); // Take some remedial action here...
    }

    console.log(util.inspect(sandbox)); // Check the results
Emmanuel Sellier
  • 526
  • 1
  • 5
  • 13
1

I made a Node.js package specifically for this called super-regex:

import {isMatch} from 'super-regex';

console.log(isMatch(/\d+/, getUserInput(), {timeout: 10000}));

Instead of executing in a worker or child process, it uses the Node.js vm module to execute in a new context. This means the execution is faster and can remain synchronous.

Sindre Sorhus
  • 62,972
  • 39
  • 168
  • 232