0

I'm supposed to split a html string at any occurrence of a element "tag" with attribute "type" with value "findMe". There could be other random attributes and random innerHTML.

Valid match: <tag type="findMe" any-other-attr="value">badabing</tag>

An example of the intended outcome:

Input:

some html text <br> with some formatting<tag id="1" type="findMe">sample text</tag> yada <tag id="2" type="dontFidMe">sample text</tag>yada

Output:

  • [0]: some html text <br> with some formatting
  • [1]: <tag id="2" type="dontFidMe"> yada yada

I've started some progress by building a regular expression to split the string but it still has some issues. If I have adjacent "tag" elements and only one of them has the type attribute "findMe", the regular expression will greedily match both of them.

(?=<tag.*?type=(?:"|')findMe(?:"|').*?\/tag>)

I know I shouldn't parse html with regular expressions, but since I'm dealing with just one element dept level and I know before-hand what to expect, I wonder what could be most efficient in terms of performance and memory.

  1. Parsing the html string to a in-memory DOM element and iterate all the nodes and splitting by tag elements with type attribute with value "findMe"?

OR

  1. Creating a regular expression to find all tag elements with attribute value "findMe"? (if so, any help to improve the above regular expression is welcome)
pelican_george
  • 961
  • 2
  • 13
  • 33

1 Answers1

0

One simple solution I believe would be to change the . inside the tag, to match anything but the closing >.

(?=<tag[^>]*?type=["']findMe["'])

See it here at regex101. (The substitution with "\n[break]\n" is only to illustrate the split)

Note that an attribute containing > - like <tag someattr="123>456" type="findMe"> would break it though. (That's one reason why regex normally isn't suitable for parsing HTML ;)

But I guess the regex approach would be beneficial in your (simple) case.

Edit Altered the regex101 example to better illustrate the solution.

Community
  • 1
  • 1
SamWhan
  • 8,296
  • 1
  • 18
  • 45