UPDATE: I am no longer specifically in need of the answer to this question - I was able to solve the (larger) problem I had in an entirely different way (see my comment). However, I'll check in occasionally, and if a viable answer arrives, I'll accept it. (It may take a week or three, though, as I'm only here sporadically.)
I have a string. It may or may not have HTML tags in it. So, it could be:
'This is my unspanned string'
or it could be:
'<span class="someclass">This is my spanned string</span>'
or:
'<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>'
or:
'<span class="no-text"><span class="silly-example"></span></span><span class="some-class">This is my spanned string</span>'
I want to find the index of a substring, but only in the portion of the string that, if the string were turned into a DOM element, would be (a) TEXT node(s). In the example, only in the part of the string that has the plain text This is my string
.
However, I need the location of the substring in the whole string, not only in the plain text portion.
So, if I'm searching for "span" in each of the strings above:
- searching the first one will return 13 (0-based),
- searching the second will skip the opening
span
tag in the string and return 35 for the stringspan
in the wordspanned
- searching the third will skip the empty
span
tag and the openings of the two nestedspan
tags, and return 91 - searching the fourth will skip the nested
span
tags and the opening of the secondspan
tag, and return 100
I don't want to remove any of the HTML tags, I just don't want them included in the search.
I'm aware that attempting to use regex is almost certainly a bad idea, probably even for simplistic strings as my code will be encountering, so please refrain from suggesting it.
I'm guessing I will need to use an HTML parser (something I've never done before). Is there one with which I can access the original parsed strings (or at least their lengths) for each node?
Might there be a simpler solution than that?
I did search around and wasn't been able to find anyone ask this particular question before, so if someone knows of something I missed, I apologize for faulty search skills.