0

I am trying to build a Regexp from a series of smaller Regexes in either string or primitive form.

I'm using Node v10.15.0.

Here are my 3 components individually

Month Matcher: /\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\bsep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b)/i

Day Matcher: /(?<day>\d{1,2})/i

Year Matcher: /(?<year>20\d\d)/i

I am trying to create a Regexp from each of these which would look something like this:

new RegExp(/\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\bsep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b) (?<day>\d{1,2}), (?<year>20\d\d)/i);

This would match 'Apr 14, 2018', 'Jun 25, 2019' etc etc.

I've made a number of attempts constructing with:

  • new RegExp(/my-pattern/i)
  • new RegExp('my-pattern' + 'my-other-pattern, 'i')
  • new RegExp(new RegExp('my-pattern', 'i') + new RegExp('other-pattern', 'i') (this one feels most silly).

One strange effect I noticed was that when I tried to build a string . via addition, the constructor would clip the output - see how the 'month' named group is altered below:

var z = new RegExp('\b(?<month>\bjan(?:uary)?\b|\bfeb(?:ruary)?\b|\bmar(?:ch)?\b|\bapr(?:il)?\b|\bmay\b|\bjun(?:e)?\b|\bjul(?:y)?\b|\baug(?:ust)?\b|\b
sep(?:tember)?\b|\boct(?:ober)?\b|\bnov(?:ember)?\b|\bdec(?:ember)?\b)' + '(?<day>\d{1,2})', 'i');
undefined

>>> (?<monthjan(?:uary)feb(?:ruary)mar(?:ch)apr(?:il)majun(?:e)jul(?:y)aug(?:ust)sep(?:tember)oct(?:ober)nov(?:ember)dec(?:ember))(?<day>d{1,2})/i

Can anyone advise on the best approach for this? Otherwise I'm likely to declare the months/days/years matchers over and over again in very verbose patterns.

Thanks

Eats Indigo
  • 388
  • 3
  • 11
  • 2
    If you are trying to use a backslash in a string literal, you need to escape it - `"\\b"`. Otherwise the string literal will interpret the escape sequence and then pass *that result* to the RegExp constructor. – VLAZ May 10 '19 at 16:22
  • Urf! Thanks @VLAZ - fixed it right away by escaping all of my escapes. – Eats Indigo May 10 '19 at 16:25
  • 1
    Yeah, common problem with strings and regexes - `/\d/` means a digit in a regex literal but the *string* literal `"\d"` results in the string with a content of the character `"d"`. So if you turn `/\d/` into `new RegExp("\d")` you are actually making the pattern `/d/` – VLAZ May 10 '19 at 16:27
  • Possible duplicate of [Why do regex constructors need to be double escaped?](https://stackoverflow.com/questions/17863066/why-do-regex-constructors-need-to-be-double-escaped) – VLAZ May 10 '19 at 16:32
  • OK, so I went around searching for a similar issue and this is the best one I found. It's not an exact dupe but there doesn't appear to be a good canonical answer for your situation. There have been others but you can hardly find the questions because their titles are not descriptive enough. I think I might try to do more searching for a good canonical and failing that, make my own because this situation is really annoying and hard to find questions/answers about. – VLAZ May 10 '19 at 16:34
  • Also, just noticed this but I am not sure why you got downvoted. I suspect it's because somebody assumed you should have known using a single backslash in a string literal shouldn't be done but as I said, that's super hard to find. As far as I know everybody just encounters this problem at some point and has to spend some gruelling time figuring out what the difference between a string literal and string content is (not immediately obvious) or have somebody point it out to them. The latter is very common and shouldn't deserve to be discouraged. – VLAZ May 10 '19 at 16:37

1 Answers1

0

This expression might help you to match your desired date strings.

((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}

You can simplify it and reduce the boundaries if you wish.

enter image description here

RegEx Descriptive Graph

This graph visualizes the expression, and if you want, you can test other expressions in this link:

enter image description here

JavaScript Test

const regex = /((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}/gm;
const str = `Apr 14, 2018`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Basic Performance Test

This JavaScript snippet returns runtime of a 1-million times for loop for performance.

const repeat = 1;
const start = Date.now();

for (var i = repeat; i >= 0; i--) {
 const string = 'Apr 14, 2018';
 const regex = /(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})/gm;
 var match = string.replace(regex, "Group #1: $1");
}

const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");
Emma
  • 27,428
  • 11
  • 44
  • 69