Here's another approach, based on dynamically building a regexp:
function wikifyText (startString, endString, text, list) {
list = list.map( function (str) {
return str.replace( /([^a-z0-9_])/g, '\\$1' );
});
list.sort();
list.reverse();
var re = new RegExp( '\\b(' + list.join('|') + ')\\b', 'g' );
return text.replace( re, startString + '$1' + endString );
}
(JSFiddle)
The \b
anchors at both ends of the regexp prevent this version from trying to wikify any partial words, but you could relax this restriction if your wanted. For example, replacing regexp construction with:
var re = new RegExp( '\\b(' + list.join('|') + ')(?=(e?s)?\\b)', 'g' );
would allow an s
or es
suffix at the end of the last wikified word (JSFiddle). Note that MediaWiki automatically includes such suffixes as part of the link text when the page is displayed.
Edit: Here's a version that also allows the first letter of each phrase to be case-insensitive, like MediaWiki page titles are. It also replaces the \b
anchors with a slightly more Unicode-friendly solution:
function wikifyText (startString, endString, text, list) {
list = list.map( function (str) {
var first = str.charAt(0);
str = first.toUpperCase() + first.toLowerCase() + str.substr(1);
str = str.replace( /(\W)/ig, '\\$1' );
return str.replace( /^(\\?.\\?.)/, '[$1]' );
});
list.sort();
list.reverse();
var re = new RegExp( '(^|\\W)(' + list.join('|') + ')(?=(e?s)?\\W)', 'g' );
return text.replace( re, '$1' + startString + '$2' + endString );
}
(JSFiddle)
This would be a lot less messy if JavaScript regexps supported such standard PCRE features as case-insensitive sections, look-behind or Unicode character classes.
In particular, due to the last of these missing features, even this solution is still not completely Unicode-aware: in particular, it allows links to begin after or end before any character that matches \W
, which includes punctuation but also all non-ASCII characters, even letters. (However, non-ASCII letters inside links are handled correctly.) In practice, I don't think this should be a major issue.