5

I'm trying to write a highlight plugin, and would like to preserve HTML formatting. Is it possible to ignore all the characters between < and > in a string when doing a replace using javascript?

Using the following as an example:

var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I would like to be able to achieve the following (replace 'dolor' with 'FOO'):

var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";

Or perhaps even this (replace 'span' with 'BAR'):

var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I came very close to finding an answer given by tambler here: Can you ignore HTML in a string while doing a Replace with jQuery? but, for some reason, I just can't get the accepted answer to work.

I'm completely new to regex, so any help would be gratefully appreciated.

Community
  • 1
  • 1
Jon
  • 119
  • 2
  • 8
  • 1
    http://stackoverflow.com/questions/2289552/jquery-can-you-ignore-html-in-string-while-doing-a-replace – ggzone Dec 14 '11 at 10:47
  • Jon, trying to parse html with regex is notoriously difficult. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – graphicdivine Dec 14 '11 at 10:48
  • 2
    You should parse the HTML and then iterate recursively over each text node. – Felix Kling Dec 14 '11 at 10:50
  • @graphicdivine he's not trying to parse it, he's just trying to change a word without modifying anything within elements – Prisoner Dec 14 '11 at 10:50
  • 2
    _" Is it possible to ignore all the characters between < and > in a string"_ - What if the string contains something like "No html tags here even though 4 **<** 5 Lorem ipsum dolor span 5 **>** 4." – nnnnnn Dec 14 '11 at 11:17

3 Answers3

6

Parsing the HTML using the browser's built-in parser via innerHTML followed by DOM traversal is the sensible way to do this. Here's an answer loosely based on this answer:

Live demo: http://jsfiddle.net/FwGuq/1/

Code:

// Reusable generic function
function traverseElement(el, regex, textReplacerFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                traverseElement(child, regex, textReplacerFunc);
            } else if (child.nodeType == 3) {
                textReplacerFunc(child, regex);
            }
            child = child.previousSibling;
        }
    }
}

// This function does the replacing for every matched piece of text
// and can be customized to do what you like
function textReplacerFunc(textNode, regex, text) {
    textNode.data = textNode.data.replace(regex, "FOO");
}

// The main function
function replaceWords(html, words) {
    var container = document.createElement("div");
    container.innerHTML = html;

    // Replace the words one at a time to ensure each one gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        traverseElement(container, new RegExp(words[i], "g"), textReplacerFunc);
    }
    return container.innerHTML;
}


var html = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
alert( replaceWords(html, ["dolor"]) );
Community
  • 1
  • 1
Tim Down
  • 318,141
  • 75
  • 454
  • 536
  • Thanks for such a great answer, Tim. Much appreciated! – Jon Dec 15 '11 at 11:53
  • This is a very good solution, but when you try to include HTML tags in the replacement text they get escaped. For example bolding the search string will result in <string> – Hawkee Nov 13 '12 at 04:24
  • @Hawkee: Yes. Allowing for HTML in the search string completely changes the problem. – Tim Down Nov 13 '12 at 10:00
  • Sorry, my last comment was incorrect, I meant to replace "string" with "string" which results in "<b>string</b>" Non-the-less I found a solution that works quite well: http://stackoverflow.com/questions/11040770/how-to-only-select-text-outside-of-a-tag-in-jquery – Hawkee Nov 13 '12 at 13:43
  • Hey, just wondering if anyone can help me with how i can insert html as a replacement? textNode.data = textNode.data.replace(regex, "" + 'FOOO' + ""); this actually renders out the span tag with the content instead of treating it as a HTML Span tag. Many thanks for your help in advance! – Paul Kirkason Dec 16 '16 at 11:49
1

This solution works with perl, and should also work with Javascript since it is compatible with ECMA 262:

s,\bdolor\b(?=[^"'][^>]*>),FOO,g

Basically, replace if the word is followed by everything which is not a quote, followed by everything which is not the closing > and the closing > itself.

fge
  • 119,121
  • 33
  • 254
  • 329
  • 1
    Though I unfortunately couldn't get your example to work, thanks all the same for your answer, fge. – Jon Dec 15 '11 at 11:55
  • anyone know the correct JS syntax for this method? other method only works on each word as opposed to taking a string composed of two words where one might be surrounded by HTML tags. – jsky Jun 24 '21 at 03:30
0

Tim Down delivered a cool function. If you want the replace-text to contain HTML then simply use this small change. The regex has to contain "()" for $1 to work for example: let regex = new RegExp('(' + textToReplace + ')', 'gi');

const textReplacerFunc = function(textNode, regex) {
    textNode.parentNode.innerHTML = textNode.data.replace(regex, '<span class="highlight">$1</span>');
};
owitec
  • 1