I'm supposed to split a html string at any occurrence of a element "tag" with attribute "type" with value "findMe". There could be other random attributes and random innerHTML.
Valid match: <tag type="findMe" any-other-attr="value">badabing</tag>
An example of the intended outcome:
Input:
some html text <br> with some formatting<tag id="1" type="findMe">sample text</tag> yada <tag id="2" type="dontFidMe">sample text</tag>yada
Output:
- [0]:
some html text <br> with some formatting
- [1]:
<tag id="2" type="dontFidMe"> yada yada
I've started some progress by building a regular expression to split the string but it still has some issues. If I have adjacent "tag" elements and only one of them has the type attribute "findMe", the regular expression will greedily match both of them.
(?=<tag.*?type=(?:"|')findMe(?:"|').*?\/tag>)
I know I shouldn't parse html with regular expressions, but since I'm dealing with just one element dept level and I know before-hand what to expect, I wonder what could be most efficient in terms of performance and memory.
- Parsing the html string to a in-memory DOM element and iterate all the nodes and splitting by tag elements with type attribute with value "findMe"?
OR
- Creating a regular expression to find all tag elements with attribute value "findMe"? (if so, any help to improve the above regular expression is welcome)