-2
<DIV><SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of <SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN></DIV>
    <DIV><SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN></DIV>
    <DIV><SPAN CLASS="dt25 ll2"> </SPAN><SPAN></DIV>
    <DIV><SPAN CLASS="dt26 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="dt27 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="jl4">UTLINE OF </SPAN>M<SPAN CLASS="jl4">ARK</SPAN> </SPAN></DIV>
    <DIV><SPAN CLASS="dt29 ll2"> </SPAN></DIV>
    <DIV><SPAN CLASS="dt30 ll2"> </SPAN></DIV>

I'm trying to retrieve entire SPAN elements here without capturing another SPAN's open tag. This regex here clearly fails

<SPAN.*?>(.*?)<\/SPAN>

An example result of the regex above is this:

<SPAN CLASS="ps23 ft0">A suggestion: for the <SPAN CLASS="em2">quickest</SPAN>

Which is undesirable. The regex that I've coded so far to achieve this is this:

<SPAN.*?>(.*?(?!<SPAN>.*?).)<\/SPAN>

And miserably fails

QU1JL
  • 1
  • 1
  • not exactly a duplisate, the question is more about backtracking, an answer `]*>(?:(?!`, indeed you must understand how backtracking works to use correctly lazy quantifier. https://regex101.com/r/mGhjCx/1 – Nahuel Fouilleul Oct 31 '18 at 08:17
  • @NahuelFouilleul thanks! – QU1JL Oct 31 '18 at 08:30

1 Answers1

0

Don't use RegEx on HTML. Use DOM manipulation

const spans = [...document.querySelectorAll("span")];
const spanContent = spans.map((span) => span.textContent);

console.log(spans)
console.log(spanContent)
<DIV>
  <SPAN CLASS="dt23 ll0">A suggestion: for the <SPAN CLASS="jl2">quickest</SPAN> overview of
  <SPAN CLASS="jl2">Mark</SPAN>, first read all the Division titles (I, II, III, etc.), then come back and read </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt24 ll0">the individual outline titles. </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt25 ll2"> </SPAN>
  <SPAN></DIV>
    <DIV><SPAN CLASS="dt26 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt27 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="jl4">UTLINE OF </SPAN>M
  <SPAN CLASS="jl4">ARK</SPAN> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt29 ll2"> </SPAN>
</DIV>
<DIV>
  <SPAN CLASS="dt30 ll2"> </SPAN>
</DIV>
mplungjan
  • 169,008
  • 28
  • 173
  • 236