5

Right now, I'm trying to create a script that automatically creates links to other pages in a wiki document.

function createLinks(startingSymbol, endingSymbol, text, links){
    //this needs to be implemented somehow - replace every match of the list of links with a link
}
createLinks("[[", "]]", "This is the text to wikify", ["wikify", "text"]);
//this function would return "This is the [[text]] to [[wikify]]" as its output.

The most obvious solution would be to simply replace every match of the string text with [[text]], but then I would run into some problems - for example, if I tried to wikify the string "some problems" and "problems" within the string "some problems", I would end up with the string "[[some [[problems]]]]". Is there any way to work around this issue?

Nemo
  • 2,441
  • 2
  • 29
  • 63
Anderson Green
  • 30,230
  • 67
  • 195
  • 328
  • I am essentially asking whether it's possible to replace a string within another string, if and only if it is not between two other strings. (e. g., replace the string `str1` inside the string `str2`, if and only if `str2` is not between the strings `str3` and `str4`). – Anderson Green Dec 30 '12 at 01:19
  • should that read `//this function would return "This is the [[text]] to [[wikify]]" as its output`? – kieran Dec 30 '12 at 01:21
  • It might be possible to do this using the lookahead and lookbehind operators in a Javascript regular expression, but I'm not very familiar with regular expression syntax. – Anderson Green Dec 30 '12 at 01:22
  • @kieran Yes, that would be the correct output in that case (but with `wikify` instead of `wikifiy`). – Anderson Green Dec 30 '12 at 01:22
  • Perhaps I could have written this question more concisely as "How can I replace a string that is not between two other strings"? This is really the only problem I'm facing here. – Anderson Green Dec 30 '12 at 01:25
  • Indeed, perhaps [this question has the answer you're looking for](http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word) – kieran Dec 30 '12 at 01:28
  • @kieran That wouldn't necessarily work in all cases: I would want to avoid wikifying any text that was between the characters `[[` and `]]`, and that's a different problem from the one you just described. Here's a problematic example: `"Be [[careful]], there are [[cars]] on the road."` I would want to avoid wikifying the word `car` inside the string `[[careful]]` or `[[cars]]`. – Anderson Green Dec 30 '12 at 01:32
  • @kieran Here's a potential solution: I could just sort the strings to wikify (so that they would be listed in ascending order of length), and then wikify them in exactly that order. Then I wouldn't need to worry about strings being wikified within the wiki tags. – Anderson Green Dec 30 '12 at 01:37
  • A TiddlyWiki plugin was written for this purpose. Perhaps part its source code could be re-used for this purpose. http://weave.tiddlyspot.com/index.html#AutoWeavePlugin – Anderson Green Dec 30 '12 at 01:54

2 Answers2

1

I've created a working demo of a script that does almost exactly what I need it to do.

http://jsfiddle.net/8JcZC/2/

alert(wikifyText("[[", "]]", "There are cars, be careful, carefully, and with great care!!", ["text", "hoogahjush", "wikify", "car", "careful", "carefully", "great care"]));

function wikifyText(startString, endString, text, list){
    //sort list into ascending order
    list.sort(function(a, b){
        return a.length - b.length; // ASC -> a - b; DESC -> b - a
    });
    //replace every element in the array with the wikified text
    for(var i = 0; i < list.length; i++){
        text = text.replace(list[i], startString + list[i] + endString);
    }
    return text;
}

A word of caution: In some cases, this script may wikify words that are part of other words. For example, if the word "careful" is not in the list, and the word car is in the list, then the word "car" will be wikified inside the word "careful", like this: "[[car]]eful". I hope that I will be able to work around this limitation.

Anderson Green
  • 30,230
  • 67
  • 195
  • 328
1

Here's another approach, based on dynamically building a regexp:

function wikifyText (startString, endString, text, list) {
    list = list.map( function (str) {
        return str.replace( /([^a-z0-9_])/g, '\\$1' );
    });
    list.sort();
    list.reverse();
    var re = new RegExp( '\\b(' + list.join('|') + ')\\b', 'g' );
    return text.replace( re, startString + '$1' + endString );
}

(JSFiddle)

The \b anchors at both ends of the regexp prevent this version from trying to wikify any partial words, but you could relax this restriction if your wanted. For example, replacing regexp construction with:

    var re = new RegExp( '\\b(' + list.join('|') + ')(?=(e?s)?\\b)', 'g' );

would allow an s or es suffix at the end of the last wikified word (JSFiddle). Note that MediaWiki automatically includes such suffixes as part of the link text when the page is displayed.


Edit: Here's a version that also allows the first letter of each phrase to be case-insensitive, like MediaWiki page titles are. It also replaces the \b anchors with a slightly more Unicode-friendly solution:

function wikifyText (startString, endString, text, list) {
    list = list.map( function (str) {
        var first = str.charAt(0);
        str = first.toUpperCase() + first.toLowerCase() + str.substr(1);
        str = str.replace( /(\W)/ig, '\\$1' );
        return str.replace( /^(\\?.\\?.)/, '[$1]' );
    });
    list.sort();
    list.reverse();
    var re = new RegExp( '(^|\\W)(' + list.join('|') + ')(?=(e?s)?\\W)', 'g' );
    return text.replace( re, '$1' + startString + '$2' + endString );
}

(JSFiddle)

This would be a lot less messy if JavaScript regexps supported such standard PCRE features as case-insensitive sections, look-behind or Unicode character classes.

In particular, due to the last of these missing features, even this solution is still not completely Unicode-aware: in particular, it allows links to begin after or end before any character that matches \W, which includes punctuation but also all non-ASCII characters, even letters. (However, non-ASCII letters inside links are handled correctly.) In practice, I don't think this should be a major issue.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • I've created a clone of Tomboy Notes using my version of the script. It generates links to Wikipedia as you type, and also prints the generated HTML. http://jsfiddle.net/gjqWy/77/ – Anderson Green Dec 30 '12 at 07:48
  • 1
    @AndersonGreen: Cool! Wish I could give you a second +1 for that. – Ilmari Karonen Dec 30 '12 at 08:41
  • I have also written a wiki link generator, using much of the same code. It generates wiki markup links instead of HTML links. Currently, it is only capable of wikifying plain text properly. http://jsfiddle.net/jarble/gjqWy/78/ – Anderson Green Jan 05 '13 at 02:56
  • It would also be useful to generate a list of page titles from a selection of text, and then ask the user which titles to wikify. http://stackoverflow.com/questions/14464986/get-a-list-of-all-page-titles-on-wikipedia – Anderson Green Jan 24 '13 at 01:32