0

This JS regex error is killing me - one correct match out of three and one false match.

If it makes a difference I am writing my script in Google Apps Script.

I have a string (xml formatted) I want to match three date nodes as follows:

<dateCreated>1619155581543</dateCreated>
<dispatchDate>1619478000000</dispatchDate>
<deliveryDate>1619564400000</deliveryDate>

I don't care about the tags so much - I just need enough to reliably replace them. I am using this regular expression:

var regex = new RegExp('[dD]ate(.{1,})?>[0-9]{13,}</');

These are the matches:

  1. dateCreated>1619155581543</
  2. Created

Obviously I understand number 1 - I wanted that. But I do not understand how 2 was matched. Also why were dispatchDate and deliveryDate not matched? All three targets are matched if I use the above regex in BBEdit and on https://ihateregex.io/playground and neither of those match "Created".

I've also tried this regular expression without success:

var regex = new RegExp('[dD]ate.{0,}>[0-9]{13,}</');

If you can't answer why my regex fails but you can offer a working solution I'd still be happy with that.

Dan
  • 43
  • 7
  • You get partial matches starting at `date` until `` You could match the strings like this `<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>` https://regex101.com/r/nfdtq0/1 but if you can use an xml parser, than might be better and less error prone. – The fourth bird Apr 23 '21 at 16:40
  • Thanks for this. Your regex is similarly working on iHateRegex but not in my script -weird. I'm writing this in Google Apps Script so I wonder if there is some bug behind the scenes? Since GAS has built in XMLservice I will see if I can easily do what I need by parsing the xml. – Dan Apr 23 '21 at 18:32
  • You can use it like `var regex = new RegExp('<([^<>]*[dD]ate[^<>]*)>\\d{13}<\\/\\1>');` or `var regex = /<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>/;` – The fourth bird Apr 23 '21 at 18:34

1 Answers1

1

The first pattern that you tried [dD]ate(.{1,})?>[0-9]{13,}</ matches:

  • [dD]ate Match date or Date
  • (.{1,})? Optional capture group, match 1+ times any char (This group will capture Created)
  • > Match literally
  • [0-9]{13,} Match 13 or more digits 0-9
  • </ Match literally

What you will get are partial matches from date till </ and the first capture group will contain Created

The second pattern is almost the same, except for {0,} which matches 0 or more times, and there is no capture group.

Still this will give you partial matches.


What you could do to match the whole element is either harvest the power of an XML parser (which would be the recommended way) or use a pattern what assumes only digits between the tags and no < > chars between the opening an closing.

Note that this is a brittle solution.

<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>
  • < Match literally
  • ( Capture group 1 (This group is used for the backreference \1 at the end of the pattern
    • [^\s<>]* Match 0+ times any character except < or >
    • [dD]ate[^<>]* Match either date or Date followed 0+ times any char except < or >
  • ) Close group 1
  • > Match literally
  • \d{13} Match 13 digits (or \d{13,} for 13 or more
  • <\/\1> Match </ then a backreference to the exact text that is captured in group 1 (to match the name of the closing tag) and then match >

Regex demo

A bit more restricted pattern could be allowing only word characters \w around matching date

<(\w*[dD]ate\w*)>\d{13}<\/\1>

Regex demo

const regex = /<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>/;
[
  "<dateCreated>1619155581543</dateCreated>",
  "<dispatchDate>1619478000000</dispatchDate>",
  "<deliveryDate>1619564400000</deliveryDate>",
  "<thirteendigits>1619564400000</thirteendigits>",
].forEach(str => {
  const match = str.match(regex);
  console.log(match ? `Match --> ${str}` : `No match --> ${str}`)
});
The fourth bird
  • 154,723
  • 16
  • 55
  • 70