Disclaimer: before the you-can't-parse-html-with-regex blind mantra begins - please give me the benefit of the doubt and read this question to the end (+ assume I already know about That RegEx-ing the HTML will drive you crazy and Parsing Html The Cthulhu Way)
Most of the complaints with Regex matching HTML come from the fact that HTML is loosely formed and Regex has difficulty matching different problems and user errors + some other things like recursion, etc.
However - what if HTML is actually valid XHTML (or more XML-like), that originated from a controlled environment (not general user-generated HTML document, but for example HTML-fragment templates that you would use in a client-side templating engine) and has been both manually checked for errors and validated numerous times?
Let me explain why I'm interested. I'm doing a speed benchmark of different String2DOM techniques in Javascript and I've tested everything from innerHTML, outerHTML, insertAdjacentHTML, createRange, DOMParser, doc.write (via iFrame) and even John Riesigs HTMLtoDOM JS library.
And I'm curious if there is a way to go even faster.
createElement/appendChild (+setAttribute and createTextNode) is the fastest way to create DOM elements in Javascript. Regex is the fastest way to traverse large strings. Couldn't these two methods still be combined to possibly create an even faster way to parse DOMString fragments into DOM?
An example HTML string:
<div class="root fragment news">
<div class="whitebg" data-name='Freddie Mercury'>
<div id='myID' class="column c2">
<h1>This is my title</h1>
<p>Vivamus urna <em>sed urna ultricies</em> ac<br/>tempor d </p>
<p>Mauris vel neque sit amet Quisque eget odio</p>
</div>
<div class="nfo hide">Lorem <a href='http://google.com/'>ipsum</a></div>
</div>
</div>
So ideally the code would return a documentFragment with Regex parsing the XHTML soup and using createElement/appendChild (+setAttribute/createTextNode) to fill in the elements. (a similar but not quite there yet example is HTML2DOM)
I (and the rest of the world) am very very interested if something like that could beat the good old innerHTML in generating DOM from DOMString in JS. Could it?
Who's game to try their knowledge making something like that? And claim their place in the annals of Stackoverflow? :)
EDIT2: who ever is blindly down-voting this - at least explain what you feel is wrong with the question? I am pretty familiar with the subject, have provided the logic behind it and also explain what is different about this scenario + even post some links that provide similar solutions. What about you?