0

The code below is supposed to return 5 matches, but it only returns one.

    var str = '"<div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="green" data-offcolor="maroon" data-paramname="rxAntMER" data-index="9" data-blockindex="0"></div>
<div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="green" data-offcolor="maroon" data-paramname="rxAntMER" data-index="8" data-blockindex="0"></div><div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="green" data-offcolor="maroon" data-paramname="rxAntMER" data-index="7" data-blockindex="0"></div>
<div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="green" data-offcolor="maroon" data-paramname="rxAntMER" data-index="6" data-blockindex="0"></div><div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="yellow" data-offcolor="maroon" data-paramname="rxAntMER" data-index="5" data-blockindex="0"></div>
<div id="rxAntMER" class="LEDPill" data-hookableby="globalid" data-oncolor="yellow" data-offcolor="maroon"';
       var results  = str.match(/id="rxAntMER".+data-blockindex="0"/g);
       alert("Number of matches = " + results.length);

The regex is trying to accomplish the following:

  1. match literal 'id="rxAntMER"'

  2. Followed by 1 or many 'any characters'

  3. until matching literal 'data-blockindex="0"'

There are 5 such matches in the provided text. I've tried samples, tutorials, and many permutations using RegEx(...) and string.Match(...), but I can't get the results I'm looking for.

Any suggestions or ideas as to what I'm doing wrong?

fuzzlog
  • 131
  • 3
  • 18
  • `id`s must be unique. [You can't parse HTML with regex](http://stackoverflow.com/a/1732454/1529630). – Oriol Dec 31 '14 at 22:37
  • 3
    Convert to non-greedy, change this part in the regex `.+?` or to `[\S\s]+?` –  Dec 31 '14 at 22:38
  • @Oriol, When id's aren't unique, you just have to make sure that the id plus another attribute in the element make for a "unique" combination. The text presented above is not a scrapping of HTML, it's an intermediary step to create proper HTML for my purposes (too long to explain). Suffice to say that it's not a typical scenario found in the wild, but something I created that requires several steps... – fuzzlog Jan 03 '15 at 00:58

1 Answers1

5

+ is greedy. As a reduced example, suppose you wanted to find how many angle-bracket pairs there were in <1><2>. You search with the regular expression <.+>, expecting it to match both. But wait, .+ matches 1><2, so you have one match.

You need to make + less greedy. Do that by following it with a ?, so you have .+?.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • Sorry for taking long to accept, the holidays got in the way. One question regarding the greediness of "+", why does the addition of '\sdata-blockindex="0"' in my original pattern not stop "+" from continuing to match "any character"? – fuzzlog Jan 02 '15 at 19:14
  • @fuzzlog: Because by nature of its being greedy, it doesn’t stop as soon as it can. You can think of the regular expression as trying both possibilities simultaneously: continue the `+`, or try to match the part after the `+`. Even after the second possibility succeeds, it still goes on trying the first possibility (extending the `+`), and eventually that one will succeed. It then has a few possible valid matches to choose from, and because `+` is greedy, it chooses the one with the longest match. – icktoofay Jan 02 '15 at 21:21