0

I would like to know how does Wikimedia transform its model syntax ({{model|options}}) into html code. I have a regex for a simple model ({{.*?}}) but it fails for a nested model (ex: {{model|options containing a {{submodel|options}}...}})

xanatos
  • 109,618
  • 12
  • 197
  • 280
Sébastien
  • 1,667
  • 3
  • 20
  • 31
  • [It's not a good idea to try to parse XML with regexes.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) No doubt it uses a real parser. – Tom Zych Sep 10 '11 at 11:56
  • When you ask for `regex` helps here, you should always specify the language you are using. The `regex`es of Javascript are less powerful that the ones of C# that (at least for Unicode) are less powerful that the ones of Perl – xanatos Sep 10 '11 at 12:08
  • I'm using regexes in combination with C# – Sébastien Sep 10 '11 at 12:11

1 Answers1

1

Remember,

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

That said, you can read: Forum tags. What is the best way to implement them? I made an example of nested tags, both with "pure" Regex and with a "more stable" C# parser that uses a little of Regexes but keeps the stack out of the Regex hands.

You can do it with balancing groups. They aren't part of "base" Regex (and some persons don't consider them to be true regexes),

But I wouldn't program something as big as Wiki with something like a regex. The problem of regexes is that it's quite difficult to program them so that they don't backtrack (there is an option to do it, but it's difficult to build a regex that doesn't need backtracking or that need only limited amout of backtracking), and when they begin to backtrack it's the end: they could stall for minutes searching for the right combination of captures.

Community
  • 1
  • 1
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • So, how do you think that they transform their models to html code ? – Sébastien Sep 10 '11 at 12:13
  • @Sébastien By writing a big chunk of code? :-) It's not by fairies. They could have done something like in my first example and then rebuilt the text with HTML codes (it's quite simple from there). But if you need a parser, I'm quite sure someone has already done it. For example try googling for `Creole C#` (on codeplex http://creoleparser.codeplex.com/) – xanatos Sep 10 '11 at 12:22
  • HAHA :D - You know how the smart mediawiki developers parse their unspecified markup? - yes regexes... – sleeplessnerd Oct 25 '11 at 15:58