8

I'm using the jquery.highlight plugin: http://code.google.com/p/gce-empire/source/browse/trunk/jquery.highlight.js?r=2

I'm using it to highlight search results.

The problem is that if I search something like "café" it won't highlight any words.

And if I search "cafe", even though my results contains both "cafe" & "café", it will only highlight "cafe".

So, I would need to highlight all "versions" of the words, with or without diacritics.

Is that possible?

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Santiago
  • 2,405
  • 6
  • 31
  • 43
  • See the answer from casablanca here: http://stackoverflow.com/questions/4261740/accent-insensitive-regex. Basically, make modifications around line 91 of the jquery.highlight.js, so that the regex now contains character classes. Maybe add an "accentInsensitive" option around line 83. – anon May 20 '11 at 04:16
  • Thanks, but I'm a little lost on how to implement that on my code... – Santiago May 20 '11 at 21:28
  • Ok. I've added an implementation below. – anon May 21 '11 at 01:23

2 Answers2

4

http://jsfiddle.net/nHGU6/

Test HTML:

<div id="wrapper-accent-sensitive">
 <p>cafe</p>
 <p>asdf</p>
 <p>café</p>
</div>
<hr />
<div id="wrapper-not-accent-sensitive">>
 <p>cafe</p>
 <p>asdf</p>
 <p>café</p>
</div>

Test CSS:

.yellow {
    background-color: #ffff00;
}

Replacement Javascript:

jQuery.fn.highlight = function (words, options) {
    var accentedForms = {
        'c': 'ç',
        'e': 'é'
    };

    var settings = { className: 'highlight', element: 'span', caseSensitive: false, wordsOnly: false, accentInsensitive: false };
    jQuery.extend(settings, options);

    if (settings.accentInsensitive) {
        for (var s in accentedForms) {
            words = words.replace(s, '[' + s + accentedForms[s] + ']');
        }
    }

    if (words.constructor === String) {
        words = [words];
    }

    var flag = settings.caseSensitive ? "" : "i";
    var pattern = "(" + words.join("|") + ")";
    if (settings.wordsOnly) {
        pattern = "\\b" + pattern + "\\b";
    }
    var re = new RegExp(pattern, flag);

    return this.each(function () {
        jQuery.highlight(this, re, settings.element, settings.className);
    });
};

Test code:

$(document).ready(function() {
    $("#wrapper-accent-sensitive").highlight("cafe", { className: 'yellow' });
    $("#wrapper-not-accent-sensitive").highlight("cafe", { className: 'yellow', accentInsensitive: true });
});
anon
  • 4,578
  • 3
  • 35
  • 54
  • Thanks, that works just fine, except with the "wordsOnly: true" option. It must be some problem with the new pattern: pattern = "\\b" + pattern + "\\b";. Also, do you know how this more complete list of diacritics should be implemented in this code? http://lehelk.com/2011/05/06/script-to-remove-diacritics/ – Santiago May 21 '11 at 02:01
  • Wow. Awesome catch! Apparently, this is not an easy problem. Check this out: http://stackoverflow.com/questions/3693750/how-can-i-make-a-regular-expression-which-takes-accented-characters-into-account. In this case, you can use the XRegExp library. As far as removing / normalizing diacritics (which would be the solution by tchrist referenced in my comment above), maybe you can check out http://rishida.net/blog/?p=222. It's pretty brutal, though. – anon May 21 '11 at 02:46
0

I can recommend a good library that supports this out of the box: highlight.js. While it is designed to do syntax highlighting for code blocks, you can equally as well highlight other (key-)words by defining an according language syntax grammar.

Setting the option of lexemes : '[äöüÄÖÜßa-zA-Z]+' in your language specification will enable keywords with German special characters, for example.

Lutz Büch
  • 343
  • 4
  • 12