0

I've constructed a RegEx query using Regex101regex preview.

The content I'm scanning looks like this:

<p>I am a paragraph<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>.</p>

<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>

The Regex I'm using looks like this:

(<p>.*?)(<a href="#fn4".*?>.*?<\/a>)(.*?<\/p>)((?:\s*<div class="inline_footnote">.*?<\/div>)*)

In the preview online it matches the footnote divs as one group (as I intend):

Regex preview screenshot

In 'live' javascript however the fourth group is absent.

Demo code in Js:

let searching_in = `<p>I am a paragraph<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>.</p>

<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>
<div class="inline_footnote"><sup class="inline_sup">[4]</sup>I am a footnote with the paragraph</a></div>`

let regex = new RegExp( `(<p>.*?)(<a href="#fn4".*?>.*?<\/a>)(.*?<\/p>)((?:\s*<div class="inline_footnote">.*?<\/div>)*)` )

console.log( regex.exec( searching_in ) )

Any help is appreciated, I have a headache now.

Mentor
  • 965
  • 9
  • 21
  • Thanks for the check, it is, however, a separate issue. Unless you are saying I shouldn't be using RegEx to be parsing HTML, in which case I'm afraid that is a requirement of this codebase. – Mentor Jul 23 '18 at 12:02
  • On regexp101, you didnt change your selection of regexp flavor from "php" to "javascript". – ASDFGerte Jul 23 '18 at 12:03
  • Even when I do, the grouping happens correctly. See the 'match info' panel when in Js. I just kept the php highlighting because it makes it visually easier for SO. – Mentor Jul 23 '18 at 12:05
  • Ah, the more important part, you used the `RegExp` constructor but didn't double escape backslashes. – ASDFGerte Jul 23 '18 at 12:06
  • The backslashes are not literal characters, they are used to escape the characters behind them, which are forward slashes (except for \s which matches whitespace). – Mentor Jul 23 '18 at 12:08
  • Oh god that was the issue. Thank you ASDFGerte. – Mentor Jul 23 '18 at 12:12

0 Answers0