Matching First and Last Occurrences Only - JavaScript

Question

So I am making a simple BBCode parser in JavaScript, nothing too fancy. I first need to get a regular expression that will match only BBCode and will only match the first and last occurrences of the tag. This will help with items that are nested in each other such as

[b][c red]This should output bold red text[/c][/b]

which should be parsed to

<span style="font-weight: bold;><span style="color: red;">This should output bold red text</span></span>

The current "Master" regex (the one that detects if there is any BBCode in the string) is as follows.

(\[{1}([^\[]{1,3})(| .*?)\]{1}(.*?)\[{1}(\/{1}[^\]]{1,3})\]{1})

Is there any way to alter this in order to detect only the first and last matches?

Note: I want to exclude wikilinks such as [[Main Page]]

There's something like 400 BBCode parsers available, why not use one of those ? — adeneo, Apr 09 '14 at 16:33
Because I am trying to do this as a small problem solving exercise. — Lil' Miss Rarity, Apr 09 '14 at 16:36
i am not able to understand what your `Master regex` does ? can you give an example here : http://regex101.com/r/xF7lX5 — aelor, Apr 09 '14 at 16:38
Any particular reason why you're using the completely wrong tool for the job for this? — tenub, Apr 09 '14 at 16:50
Maybe [this](http://regex101.com/r/fH6eM5) can get you started.... — Sam, Apr 09 '14 at 16:57
@Sam I had figured it all out by the time you answered this. If you could add that as an actual answer I'd be happy to accept it. — Lil' Miss Rarity, Apr 10 '14 at 06:12
@tenub If they are the wrong tools then what would you suggest using? Second, I did it an have a pretty good parser, albeit it could be a bit more efficient in it's output. — Lil' Miss Rarity, Apr 10 '14 at 06:13

score 0 · Accepted Answer · edited May 23 '17 at 11:57

Regular expressions wouldn't be the right tool for the job, just like it isn't the right job for parsing HTML. This is because it is a context-free language and not a regular language (hence regular expression).

However, I can never complain with someone working on something as a "small problem solving exercise" (that's why I'm on SO). You said my comment helped, so I'll post it and add an explanation.

\[(\w{1,3})\](.*)\[\/\1\]
<$1>$2</$1>

First we look for [ followed by our first capturing group of 1-3 "word" characters ([a-zA-Z0-9_]) followed by the ]. This \w can be replaced with [^\]] to match any character but the closing bracket or really anything else of your choosing (I'm not entirely sure of the BBCode specs and what a tag can consist of). Then we will (greedily) capture 0+ characters into another group. Finally, we look for a [\ containing our first captured group (\1 which references to \w{1,3}) followed by a ]. Since we used a greedy capture with (.*), it will keep going until it gets to the last closing tag.

Now we have 2 captured groups, one with the tag and one with the contents. You can change the [ to < by simply referencing the groups: <$1>$2</$1>

Regex101

My small project has a very clear set of syntactical rules that the parser strictly follows. It is set to not parse if it cannot detect properly formatted and valid BBCode. — Lil' Miss Rarity, Apr 11 '14 at 19:06

Matching First and Last Occurrences Only - JavaScript

1 Answers1