Wow! This was a surprisingly difficult problem, although it seems like it should be simple at first glance.
The problem is that, strictly speaking, your requirement demands that only text nodes be processed to transform hashtags into links. Existing HTML should not be touched at all.
A naïve approach (seen in the other answers) would attempt to devise a complex regular expression to dodge the HTML. Although this may appear to work for some cases, even nearly all practical cases, it is absolutely not foolproof. Regular expressions are simply not powerful enough to fully parse HTML; it is just too complex a language. See the excellent and rather famous Stack Overflow answer at RegEx match open tags except XHTML self-contained tags. It can't be done perfectly, and should never be done at all.
Rather, the correct approach is to traverse the HTML tree using a recursive JavaScript function, and replace all target text nodes with processed versions of themselves, which, importantly, may involve the introduction of (non-text) HTML markup inside the text node.
jQuery can be used to accomplish this with minimal complexity, although the task itself necessitates a certain amount of complexity, which, honestly, can't be avoided. As I said, this is a surprisingly difficult problem.
HTML
<button onclick="tryItClick()">Try it</button>
<p id="demo">Please visit #Microsoft! #facebook <a href="#link">Somelink</a>
</p>
JavaScript
if (!window.Node) {
window.Node = {
ELEMENT_NODE : 1,
ATTRIBUTE_NODE : 2,
TEXT_NODE : 3,
CDATA_SECTION_NODE : 4,
ENTITY_REFERENCE_NODE : 5,
ENTITY_NODE : 6,
PROCESSING_INSTRUCTION_NODE : 7,
COMMENT_NODE : 8,
DOCUMENT_NODE : 9,
DOCUMENT_TYPE_NODE : 10,
DOCUMENT_FRAGMENT_NODE : 11,
NOTATION_NODE : 12
};
} // end if
window.linkify = function($textNode) {
$textNode.replaceWith($textNode.text().replace(/#(\w+\.?\w+)/g,'<a href="http://example.com?hashtag=$1">#$1</a>'));
}; // end linkify()
window.processByNodeType = function($cur, nodeTypes, callback, payload ) {
if (!nodeTypes.length)
nodeTypes = [nodeTypes];
for (var i = 0; i < $cur.length; ++i) {
if ($.inArray($cur.get(i).nodeType, nodeTypes ) >= 0)
callback($cur.eq(i), $cur, i, payload );
processByNodeType($cur.eq(i).contents(), nodeTypes, callback, payload );
} // end for
} // end processByNodeType()
window.tryItClick = function(ev) {
var $top = $('#demo');
processByNodeType($top, Node.TEXT_NODE, linkify );
}; // end tryItClick()
http://jsfiddle.net/3u6jt988/
It's always good to write general code where possible, to maximize reusability, and often simplicity (although too much generality can lead to excessive complexity; there's a tradeoff there). I wrote processByNodeType()
to be a very general function that uses jQuery to traverse a subtree of the HTML node tree, starting from a given top node and working its way down. The purpose of the function is to do one thing and one thing only: to call the given callback()
function for all nodes encountered during the traversal that have nodeType
equal to one of the whitelisted values given in nodeTypes
. That's why I included an enumeration of node type constants at the top of the code; see http://code.stephenmorley.org/javascript/dom-nodetype-constants/.
This function is powerful enough to be called once in response to the click event, passing it the #demo
element as the top node, whitelisting only Node.TEXT_NODE
nodes, and providing linkify()
as the callback.
When linkify()
is called, it just takes its first argument, which is the node itself, and does the exact replacement you devised (although capture group backreferences had to be added to properly replace the text with the hashtag). The last piece of the puzzle was to replace the text node with whatever new node structure is needed to effect the replacement, which, if there was indeed a hashtag to replace, would involve the introduction of new HTML structure over the old plain text node. Fortunately, jQuery, whose awesomeness knows no bounds, makes this so incredibly easy that it can be accomplished with a sweet one-liner:
$textNode.replaceWith($textNode.text().replace(/#(\w+\.?\w+)/g,'<a href="http://example.com?hashtag=$1">#$1</a>'));
As you can see, a single call to text()
gets the text content of the plain text node, then the replace()
function on the string object is called to replace any hashtag with HTML, and then jQuery's replaceWith()
method allows us to replace the whole text node with the generated HTML, or leave the original plain text in place if no substitution was performed.
References