2

First off, don't link to the "Don't parse HTML with Regex" post :)

I've got the following HTML, which is used to display prices in various currencies, inc and ex tax:

<span id="price_break_12345" name="1">
    <span class="price">
        <span class="inc" >
            <span class="GBP">£25.00</span>
            <span class="USD" style="display:none;">$34.31</span>
            <span class="EUR" style="display:none;">27.92&nbsp;€</span>
        </span>
        <span class="ex"  style="display:none;">
            <span class="GBP">£20.83</span>
            <span class="USD" style="display:none;">$34.31</span>
            <span class="EUR" style="display:none;">23.27&nbsp;€</span>
        </span>
    </span>
    <span style="display:none" class="raw_price">25.000</span>
</span>

An AJAX call returns a single string of HTML, containing multiple copies of the above HTML, with the prices varying. What I'm trying to match with regex is:

  • Each block of the above HTML (as mentioned, it occurs multiple times in the return string)
  • The value of the name attribute on the outermost span

What I have so far is this:

var price_regex = new RegExp(/(<span([\s\S]*?)><span([\s\S]*?)>([\s\S]*?)<\/span><\/span\>)/gm);
console && console.log(price_regex.exec(product_price));

It matches the first price break once for each price break that occurs (so if there's name=1, name=5 and name=15 it matches name=1 3 times.

Whereabouts am I going wrong?

Community
  • 1
  • 1
Joe
  • 15,669
  • 4
  • 48
  • 83

2 Answers2

2

So, if you can count on the format of that first span in each block like this:

<span id="price_break_12345" name="1">

Then, how about you use code like this to cycle through all the matches. This code identifies the price_break_xxxx id value in that first span and then picks out the following name attribute:

var re = /id="price_break_\d+"\s+name="([^"]+)"/gm;
var match;
while (match = re.exec(str)) {
    console.log(match[1]);
}

You can see it work here: http://jsfiddle.net/jfriend00/G39ne/.

I used a converter to make three of your blocks of HTML into a single javascript string (to simulate what you get back from your ajax call) so I could run the code on it.


A more robust way to do this is to just use the browser's HTML parser to do all the work for you. Assuming you have the HTML in a string variable named `str', you can use the browser's parser like this:

function getElementChildren(parent) {
    var elements = [];
    var children = parent.childNodes;
    for (var i = 0, len = children.length; i < len; i++) {
        // collect element nodes only
        if (children[i].nodeType == 1) {
            elements.push(children[i]);
        }
    }
    return(elements);
}

var div = document.createElement("div");
div.innerHTML = str;
var priceBlocks = getElementChildren(div);
for (i = 0; i < priceBlocks.length; i++) {
    console.log(priceBlocks[i].id + ", " + priceBlocks[i].getAttribute("name") + "<br>");
}

Demo here: http://jsfiddle.net/jfriend00/F6D8d/

This will leave you with all the DOM traversal functions for these elements rather than using (the somewhat brittle) regular expressions on HTML.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Because I also need to capture the entire price break it matched (one price break is one copy of the HTML block from the question). However, you just answered why it only returns the first one - I wasn't using `while`, I was just executing it once :) – Joe Feb 20 '12 at 07:04
  • I added a new way of doing this to my answer that uses the browser's HTML parser rather than regular expressions. – jfriend00 Feb 20 '12 at 15:06
0

Thanks in large part to jfriend for making me realise why my regex was matching in a strange way (while (price_break = regex.exec(string)) instead of just exec'ing it once), I've got it working:

var price_regex = new RegExp(/<span[\s\S]*?name="([0-9]+)"[\s\S]*?><span[\s\S]*?>[\s\S]*?<\/span><\/span\>/gm);
var price_break;
while (price_break = price_regex.exec(strProductPrice))
{
    console && console.log(price_break);
}

I had a ton of useless () which were just clogging up the result set, so stripping them out made things a lot simpler.

The other thing, as mentioned above was that originally I was just doing

price_break = price_regex.exec(strProductPrice)

which runs the regex once, and returns the first match only (which I mistook for returning 3 copies of the first match, due to the ()s). By looping over them, it keeps evaluating the regex until all the matches have been exhausted, which I assumed it did normally, similar to PHP's preg_match.

Joe
  • 15,669
  • 4
  • 48
  • 83
  • FYI, you don't generally use the `/regexhere/` syntax with `new Regexp()` (though it will still work). Use one or the other. So, you're regex can be declared as: `var price_regex = /[\s\S]*?<\/span><\/span\>/gm;` – jfriend00 Feb 20 '12 at 14:32