The general idea is to take a string of HTML, parse it into a document (a tree of dom elements) then traverse it to extract information.
If the link was:
<a href="/browse/post/something/"><b>something</b> else</a>
First traverse the tree to find the anchor tag, then:
anchor.textContent // returns "something else"
It is simple to extract the text from an element, even when there are other elements in the tree below which also contain text. This is also more robust than the regex example. Say someone added a class attribute to the anchor, then the regex in the accepted answer would no-longer match the anchor tag. But a traversal based solution would still work.
In the simple case, you can create a div, then set the innerHTML
to your HTML string, then traverse it:
var html = '<p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a>';
var div = document.createElement("div");
div.innerHTML = html;
var anchors = div.getElementsByTagName("a");
for (var i = 0; i < anchors.length; i++) {
console.log(anchors[i].textContent);
}
A more sophisticated version of this is packaged in the jQuery(string) function.
var html = '<div><p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a></div>';
jQuery(html).find("a").each(function() {
console.log(jQuery(this).text());
});
Live example: http://jsfiddle.net/ygcFM/