2

I'm trying to tease out a date from a block of text. As far as I know, the date will always look similar to Mar 5, 2015 (three-letter month, day with no leading zeros, four-digit year).

The block of text is a little more variable, however. For the most part, it looks generally like this:

We understand that sometimes your travel plans change. We do not charge a change or cancel fee. However, this property (Hotel Name) imposes the following penalty to its customers that we are required to pass on: Cancellations or changes made after 11:59 AM ((GMT-05:00) Eastern Time (US & Canada)) on Mar 10, 2015 are subject to a 1 Night Room & Tax penalty. The property makes no refunds for no shows or early checkouts.

Here's my attempt (val is the variable containing the string):

var valDate = val.match("\\\)\\\) on (.*)are");
return valDate[1];

As you can see, I went for the two )) at the end of the timezone (which I believe will always be there, regardless of EST/PST/etc) and the 'are' that immediately follows the date.

And this was working very well.... until one of my hotels passed the following:

We understand that sometimes your travel plans change. We do not charge a change or cancel fee. However, this property (Hotel Name) imposes the following penalty to its customers that we are required to pass on: cancellations or changes made before 6:00 PM ((GMT-05:00) Eastern Time (US & Canada)) on Mar 15, 2015 are subject to a 1 Night Room & Tax penalty. Cancellations or changes made after 6:00 PM ((GMT-05:00) Eastern Time (US & Canada)) on Mar 15, 2015 are subject to a 1 Night Room & Tax penalty. The property makes no refunds for no shows or early checkouts.

And my code returned:

Mar 15, 2015 are subject to a 1 Night Room & Tax penalty. Cancellations or changes made after 6:00 PM ((GMT-05:00) Eastern Time (US & Canada)) on Mar 15, 2015

Which is somewhere less than desirable. I think I understand why this is happening, but try though I might I'm not fixing it. Additionally, my original match is admittedly fumbly (hence this problem). I'm guessing there's probably a better way to tease out the date... I just have no idea how.

Can someone help me? I will be ever so grateful!

1 Answers1

1

A pattern that can match the description you give is:

(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9]|[1-2][0-9]|3[0-1]),\s+\d{4}

Regex101 demo

Although I'm convinced that if you want to capture dates, you better look for a library to do this. Such library is probably less error prone, will have different date patterns and is probably easier to customize.

You can capture groups as this answer shows:

var r = /(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9]|[1-2][0-9]|3[0-1]),\s+\d{4}/g;
var t = "We understand that sometimes your travel plans change. We do not charge a change or cancel fee. However, this property (Hotel Name) imposes the following penalty to its customers that we are required to pass on: Cancellations or changes made after 11:59 AM ((GMT-05:00) Eastern Time (US & Canada)) on Mar 10, 2015 are subject to a 1 Night Room & Tax penalty. The property makes no refunds for no shows or early checkouts.";
m = r.exec(t);
while (m != null) {
    //do something with m[0]
    alert(m[0]);//example
    m = r.exec(t);
}

JSFiddle.

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • This works perfectly - thank you so much for your help and advice! One question: you have the line `m = r.exec(t);` in there twice. Isn't only the first one necessary, or am I totally missing something? – Aaron Eckert Mar 16 '15 at 15:35
  • 1
    @AaronMickelson: the `/g` modifiers makes that you iterate over **all matches**. The `while` loop tests every time that there is still a match available. The js regex engine somehow keeps track where `r` is located over `t`, so you need to call it in the while loop as well, otherwise the program will run into an infinite loop. – Willem Van Onsem Mar 16 '15 at 16:28