0

So I am making a simple BBCode parser in JavaScript, nothing too fancy. I first need to get a regular expression that will match only BBCode and will only match the first and last occurrences of the tag. This will help with items that are nested in each other such as

[b][c red]This should output bold red text[/c][/b]

which should be parsed to

<span style="font-weight: bold;><span style="color: red;">This should output bold red text</span></span>

The current "Master" regex (the one that detects if there is any BBCode in the string) is as follows.

(\[{1}([^\[]{1,3})(| .*?)\]{1}(.*?)\[{1}(\/{1}[^\]]{1,3})\]{1})

Is there any way to alter this in order to detect only the first and last matches?

Note: I want to exclude wikilinks such as [[Main Page]]

1 Answers1

0

Regular expressions wouldn't be the right tool for the job, just like it isn't the right job for parsing HTML. This is because it is a context-free language and not a regular language (hence regular expression).

However, I can never complain with someone working on something as a "small problem solving exercise" (that's why I'm on SO). You said my comment helped, so I'll post it and add an explanation.

\[(\w{1,3})\](.*)\[\/\1\]
<$1>$2</$1>

First we look for [ followed by our first capturing group of 1-3 "word" characters ([a-zA-Z0-9_]) followed by the ]. This \w can be replaced with [^\]] to match any character but the closing bracket or really anything else of your choosing (I'm not entirely sure of the BBCode specs and what a tag can consist of). Then we will (greedily) capture 0+ characters into another group. Finally, we look for a [\ containing our first captured group (\1 which references to \w{1,3}) followed by a ]. Since we used a greedy capture with (.*), it will keep going until it gets to the last closing tag.

Now we have 2 captured groups, one with the tag and one with the contents. You can change the [ to < by simply referencing the groups: <$1>$2</$1>

Regex101

Community
  • 1
  • 1
Sam
  • 20,096
  • 2
  • 45
  • 71
  • 1
    My small project has a very clear set of syntactical rules that the parser strictly follows. It is set to not parse if it cannot detect properly formatted and valid BBCode. – Lil' Miss Rarity Apr 11 '14 at 19:06