2

I'm trying to build a small Javascript application which loads a third party site, finds a given word and highlights the closest context of the document using the Jquery extension Highlight (with a small customization for allowing regular expressions).

First, I'm trying to get the application to highlight surrounding by setting the context to 500 characters, but for some reason it cuts off in weird places. For this article, I'm trying to match the term Obama, and as you can see from my screenshot, it cuts off in places where it shouldn't be.

Does anyone have any clue of what's going on?

$(document).ready(function() {
    $.get(getUrlVars()["url"],
    function(data) {
        var fdata = $(data);
        var associationScope= 500;


        $.each(getUrlVars()["topics"].split(","), function(index, value) {
            if (getUrlVars()["associationScope"] == "context") {
                var associationScopeRegex = "((?!</span>)[\\s\\S]{0," + associationScope + "})" 
                    + value + "((?!<span class=\"associationScope\">)[\\s\\S]{0," + associationScope + "})";

                fdata.highlight(associationScopeRegex, {className: "associationScope"});
            }

            fdata.highlight(value, {className: "topicHighlight"});
        });

        $("#externalPage").html(fdata);

    });
});

Screenshot of the highlighting result

Jimmy C
  • 9,270
  • 11
  • 44
  • 64
  • 2
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Erik Philips Apr 20 '13 at 00:01

1 Answers1

2

You need to escape some regex metacharacters (well backslash in your case) when you build via strings:

   var associationScopeRegex = "((?!</span>)(.|\\n|\\r|\\t){0," + associationScope + "})" 
       + value + "((?!<span class=\"associationScope\">)(.|\\n|\\r|\\t){0," + associationScope + "})";

When you build a regex from a string, you have to take into account the fact that the JavaScript parser doesn't know that your string is going to be a regex; it just parses it as a string. The syntax for string constants uses backslashes for some special characters, so those will be interpreted thusly as part of the string.

(You don't have to double-up on the backslashes for the double-quote characters, because it's OK to leave them as simple double-quotes for the regex.)

Pointy
  • 405,095
  • 59
  • 585
  • 614
  • Thank you for your comment. I updated the regex above using double backlashes (and also simplified it to [\\s\\S]), but the thing is that it still results in exactly the same error... – Jimmy C Apr 20 '13 at 00:04
  • Well perhaps the problem lies in your "small customizations" to the Highlight tool. – Pointy Apr 20 '13 at 00:06