1

Is it possible to create a regex that retrieves all capturing groups matching that type of html input:

<em>word1</em> <em>word2</em> <em>word3</em>
prefix: <em>word4</em> <em>word5</em>
<em>word6</em> <em>word7</em>

That matches

word4 word5

I have tried with Lookahead and Lookbehind Zero-Length Assertions but with no success.

Here is my try

https://regex101.com/r/lA9xA3/2

But I do know how to make groups repeating on every next occurence following my 'prefix: '

Thanks a lot,

Julien

darul75
  • 343
  • 6
  • 21

1 Answers1

1

You need to get the line that begins with the prefix and then get the texts inside <em> tags.

This is better done in two passes in order not to compromise performance and readability:

var re = /^prefix:((?: *<em>\w*\d*<\/em>)*) */gm; 
var str = 'prefix: <em>word1</em> <em>word2</em> <em>word3</em>\n<em>word4</em> <em>word5</em>\nprefix: <em>word6</em> <em>word7</em> <em>word8</em>';
var arr = [];
 
while ((m = re.exec(str)) !== null) {
  var tmp = m[1].match(/[^<>]*(?=<\/em)/g); // Get matches inside EM
  if (tmp) {                                // If there are any
    tmp = tmp.filter(Boolean);              // Remove empty array elements
    for (var i=0; i<tmp.length;i++) {
      arr.push(tmp[i]);                     // Add to resulting array
    }
  }
}
document.body.innerHTML = "<pre>" + JSON.stringify(arr, 0, 4) + "</pre>";
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563