0

I have the following data in a variable.

<ctx>
  <PostCode>XXXXXX</PostCode>
  <Title1>Mr</Title1>
  <Name>John</Name>
  <Order1>£100.00</Order1>
  <Order2>£100.01</Order2>
  <Date>10/10/2010</Date
</ctx>

Using the following regex var payload = ctx.toString().match(/Order[1-9]/g); I get the following results

Order1,Order1,Order2,Order2

How can I make it stop at Order1, Order2 as is counting the second closing tag, also I can't use <Order[1-9]> (opening tag) as my application does not allow me to capture the tags <>. Basically a regex that returns unique values.

So the following regex seems to work to some extend. (Order[0-9])(?!.*\1) (Order[0-9])(?!.*\1)

https://regex101.com/r/6QhFBg/1

David Garcia
  • 3,056
  • 18
  • 55
  • 90

2 Answers2

2

Never parse XML with regex. Wrong tool for the job – leads to brittle solutions.

Instead, use a real XML parser or XPath.

For example, this XPath,

//*[starts-with(local-name(), 'Order')]

will robustly select all elements whose name starts with "Order".

In JavaScript in the browser, XPath expressions are evaluated via document.evaluate:

var orders = document.evaluate('//Order[starts-with(local-name(), 'Order')]', document, 
                               null, XPathResult.ANY_TYPE, null );
var thisOrder = orders.iterateNext();

while (thisOrder) {
  console.log(thisOrder.textContent);
  thisOrder = orders.iterateNext();
}

See also How to use document.evaluate() and XPath to get a list of elements?

For parsing XML stored in a string, see for example:

kjhughes
  • 106,133
  • 27
  • 181
  • 240
1

let ctx = 
`<ctx>
  <PostCode>XXXXXX</PostCode>
  <Title1>Mr</Title1>
  <Name>John</Name>
  <Order1>£100.00</Order1>
  <Order2>£100.01</Order2>
  <Date>10/10/2010</Date
</ctx>`;

let payload = ctx
  .match(/<Order[1-9]>/g) // e.g. <Order1>
  .map(o => o.substring(1, o.length - 1)); // trim first char `<` and last char `>`

console.log(payload);
junvar
  • 11,151
  • 2
  • 30
  • 46
  • Thanks, this would have worked if my application allowed me to capture the whole payload structure including tags and anchors `<>`, but I can't use `` – David Garcia Jan 30 '20 at 17:46
  • Did you read the next line (`.map(...`)? It removes the `<` and `>`. – junvar Jan 30 '20 at 17:59
  • Is there a regex variant that matches Order[1-9] but only returns unique values. – David Garcia Jan 30 '20 at 18:10
  • Regex isn't going to do operations like 'find uniques' or 'map integers to double their value'. But that's where `js` comes in. – junvar Jan 30 '20 at 18:29