0

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.

At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.

This is as close as I have gotten:

(?<=^|>)([^><].*?)(?=<|$)

It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.

Input: Any html element (this could be quite large, eg <body>)    
Search Term: 1 or more characters    
Replace Txt: <span class='highlight'>$1</span>

UPDATE

The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...

Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>

However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".

var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
   var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
   $(this).html(text);
});

It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?

UPDATE

The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.

doub1ejack
  • 10,627
  • 20
  • 66
  • 125
  • 2
    [tag:sigh] Please refrain from parsing HTML with RegEx as it will [drive you insane](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost May 02 '12 at 15:19
  • I've got a plan to move to html parsing, but I need a quick proof-of-concept before I'll get the green light on that. – doub1ejack May 02 '12 at 15:37
  • You should have that as your proof of concept, not RegExp. This is a solved problem, please don't overkill yourself with RegExp. – Madara's Ghost May 02 '12 at 15:38
  • @Truth: Thank you for your concern. Please desist. I agree with your statements and embrace your apparent agenda. My question is, how can I insert a search term into this regex string? – doub1ejack May 02 '12 at 16:08
  • 1
    Have a look at [mark.js](https://markjs.io/) as it might be the thing you're searching for. – dude May 21 '16 at 14:20

1 Answers1

0

Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.

For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.

Example code (adjusted, origin is here):

(function iterate_node(node) {
    if (node.nodeType === 3) { // Node.TEXT_NODE
        var text = node.data,
            pos = text.search(/any regular expression/g), //indexOf also applicable
            length = 5; // or whatever you found
        if (pos > -1) {
            node.data = text.substr(0, pos); // split into a part before...
            var rest = document.createTextNode(text.substr(pos+length)); // a part after
            var highlight = document.createElement("span"); // and a part between
            highlight.className = "highlight";
            highlight.appendChild(document.createTextNode(text.substr(pos, length)));
            node.parentNode.insertBefore(rest, node.nextSibling); // insert after
            node.parentNode.insertBefore(highlight, node.nextSibling);
            iterate_node(rest); // maybe there are more matches
        }
    } else if (node.nodeType === 1) { // Node.ELEMENT_NODE
        for (var i = 0; i < node.childNodes.length; i++) {
            iterate_node(node.childNodes[i]); // run recursive on DOM
        }
    }
})(content); // any dom node

There's also highlight.js, which might be exactly what you want.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • I see how .text() can be used to obtain and replace an element's text, but I don't see how it is possible to use this to search/replace a subset of that element's text. Example: I only want to highlight the word 'and' in a long

    element. Ideas?

    – doub1ejack May 02 '12 at 18:37
  • Then you might need to use [native DOM](https://developer.mozilla.org/en/Gecko_DOM_Reference) methods and alter text nodes. – Bergi May 02 '12 at 23:15
  • Cool. I'm good for now, but when I get the OK on this project I think I'll try this approach first. Using the jquery :contains method (api.jquery.com/contains-selector/) I should be able to find my search terms in the DOM. Once I have the elements, it should be fairly simple to manipulate the .text() as necessary. Thanks Bergi. – doub1ejack May 03 '12 at 13:21
  • whoop - spoke too soon. $(#target *:contains('text')) does a good job of finding elements, but it returns the containing element. That element contains a mix of content, my search term, and other html. Using .text() strips out tags (unacceptable) and .html() leaves me with the original problem of searching mixed content & markup for the search term. :contains() narrows the playing field, but the search/replace problem remains. @Bergi, did you have a particular native DOM approach in mind? – doub1ejack May 03 '12 at 13:45
  • Yes, I already have coded various text-node-iterators :) Too long for a comment, extended my answer. – Bergi May 03 '12 at 17:45