I would like to know how does Wikimedia transform its model syntax ({{model|options}})
into html code.
I have a regex for a simple model ({{.*?}})
but it fails for a nested model (ex: {{model|options containing a {{submodel|options}}...}}
)
-
[It's not a good idea to try to parse XML with regexes.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) No doubt it uses a real parser. – Tom Zych Sep 10 '11 at 11:56
-
When you ask for `regex` helps here, you should always specify the language you are using. The `regex`es of Javascript are less powerful that the ones of C# that (at least for Unicode) are less powerful that the ones of Perl – xanatos Sep 10 '11 at 12:08
-
I'm using regexes in combination with C# – Sébastien Sep 10 '11 at 12:11
1 Answers
Remember,
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
That said, you can read: Forum tags. What is the best way to implement them? I made an example of nested tags, both with "pure" Regex and with a "more stable" C# parser that uses a little of Regexes but keeps the stack out of the Regex hands.
You can do it with balancing groups
. They aren't part of "base" Regex (and some persons don't consider them to be true regexes),
But I wouldn't program something as big as Wiki with something like a regex. The problem of regexes is that it's quite difficult to program them so that they don't backtrack (there is an option to do it, but it's difficult to build a regex that doesn't need backtracking or that need only limited amout of backtracking), and when they begin to backtrack it's the end: they could stall for minutes searching for the right combination of captures.
-
So, how do you think that they transform their models to html code ? – Sébastien Sep 10 '11 at 12:13
-
@Sébastien By writing a big chunk of code? :-) It's not by fairies. They could have done something like in my first example and then rebuilt the text with HTML codes (it's quite simple from there). But if you need a parser, I'm quite sure someone has already done it. For example try googling for `Creole C#` (on codeplex http://creoleparser.codeplex.com/) – xanatos Sep 10 '11 at 12:22
-
HAHA :D - You know how the smart mediawiki developers parse their unspecified markup? - yes regexes... – sleeplessnerd Oct 25 '11 at 15:58