0

In the context of using regexp for matching documents against several keywords it could takes quadratic time (number of documents × number of keywords).

With a set of the following keywords:

['ant', 'antelope', 'albatross', 'alligator']

I would produce the stupid regex:

/ant|antelope|albatross|alligator/

But I'm looking for a solution which convert the set into something like:

/a(nt(|elope)|l(batross|ligator))/

which is quite more efficient

np42
  • 847
  • 7
  • 13

1 Answers1

0

I produced a simple library to make that work available on npm npm i regexp-factorize which can be used as following

import { factorizeOr } from 'regexp-factorize';

const keywords = ['ant', 'antelope', 'albatross', 'alligator'];
const regexpString = factorizeOr(keywords);
const regexp = new RegExp(regexpString);

const document = "There's an alligator in my house";

console.log(regexp);                // /a(?:nt(?:|elope)|l(?:batross|ligator))/
console.log(regexp.test(document)); // true
np42
  • 847
  • 7
  • 13