You should really try to avoid using regular expressions for parsing HTML. Especially with very powerful tools to do so built right into every browser.
Here is a solution with no regular expressions, I find it pretty simple.
Here is how it works:
- We create an HTML element
- The browser already contains a very good HTML parser :) It handles edge cases like spaces in the name, escaped entities, and partial HTML for us just like it does for web pages. We dump the HTML in the element.
- We can query the element using the querySelector syntax, or even simpler getElementsByTagName if you're an old fashined guy.
- We use the
textContent
property to obtain the text.
Actual code:
var test = '<b><font color="#32748">My string:</font></b><big> My value </big><br>';
// we create an empty element and put the html in it
var div = document.createElement("div");
div.innerHTML = test;
// get the text from the font tag, as you asked for.
var test = div.querySelector("font").textContent;
Note, <font>
tags are deprecated and should not be used in new code. I'd consider checking out the current HTML5 spec and seeing how things work in modern HTML.
Note2, in oldIE you can't use textContent
so you can do innerHTML
or innerText
.