2

I'm currently in the process of creating a regex that parses for the following format: "January 20, 2009 – January 20, 2017"

However, despite the value parsing correctly on RegEx101, it is not properly parsing in javascript.

var text = "January 20, 2009 – January 20, 2017";
alert(text);
var replacedText = text.replace(/(January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[ ]([1-9]|[12][0-9]|3[01])[ ,][ ]\d\d\d\d[ ][\p{Pd}][ ](January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[ ]([1-9]|[12][0-9]|3[01])[ ,][ ]\d\d\d\d/gi,'Replace Me');

alert(replacedText);

I'm curious as to my mistake. My problems arise when it comes to the evaluation of the dash character. For people not wanting to run my code, here is the RegEx

/(January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) [ ,][ ]\d\d\d\d[ ][\p{Pd}] [ ,][ ]\d\d\d\d/gi

[\p{Pd}] is the code that is causing the RegEx to mess up (I believe).

I should note that I am currently testing this code on a content javascript for a Chrome Extension. In addition, when the above code runs as a Chrome Extension, the alert box prints the following:

"January 20, 2009 – January 20, 2017"

Edit: I modified the RegEx to include the en and em dash unicode character in the expression, and the code is still not working. This is the new solution I've come up with:

var text = "January 20, 2009 – January 20, 2017";
alert(text)
var replacedText = text.replace(/(January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[ ]([1-9]|[12][0-9]|3[01])[ ,][ ]\d\d\d\d[ ][\u2013\u2014\-][ ](January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[ ]([1-9]|[12][0-9]|3[01])[ ,][ ]\d\d\d\d/gi,'Replace Me');
alert(replacedText);

The portion in question is now [\u2013\u2014-]

Edit 2:It appears the new code works in the run code snippet box but not in a chrome content script.

  • @CertainPerformance despite the fact I've told the regex to search for the specific unicode characters, the expression is still not running. Maybe I'm misunderstanding your solution? – JJ Thompson Jan 23 '19 at 01:18
  • As you can see in the snippet, it looks to be working. (also note that character sets that match only a single character are useless - just match the single character without the set). Is the code here exactly the same as what you're trying in the content script? – CertainPerformance Jan 23 '19 at 01:31
  • You can fix this with a BOM, but I'm not sure that's intended. – Josh Lee Jan 23 '19 at 02:59
  • This is _infuriating_, I can't even get Chrome to reliably load my extension script as either windows-1252 or UTF-8. It keeps changing. – Josh Lee Jan 23 '19 at 16:49

0 Answers0