Regex to split BBCode into pieces

Question

I have this:

str = "some html code [img]......[/img] some html code [img]......[/img]"

and I want to get this:

["[img]......[/img]","[img]......[/img]"]

regex isn't exactly the best of choices for this. – muhmuhten Sep 25 '10 at 03:05 — muhmuhten, Sep 25 '10 at 03:05

score 46 · Answer 1 · edited Jan 25 '18 at 11:55

46

Please don't use BBCode. It's evil.

BBCode came to life when developers were too lazy to parse HTML correctly and decided to invent their own markup language. As with all products of laziness, the result is completely inconsistent, unstandardized, and widely adopted.

Try to use a user-friendlier markup language, like Markdown (that's what Stack Overflow uses) or Textile. Both of them have parsers for Ruby:

Maruku for Markdown
RedCloth for Textile

If you still don't want to heed to my advice and choose to go with BBCode, don't reinvent the wheel and use a BBCode parser. To answer your question directly, there is the least desirable option: use regex.

/\[img\].*?\[\/img\]/

As seen on rubular. Although I would use /\[img\](.*?)\[\/img\]/, so it will extract the contents inside the img tags. Note that this is fairly fragile and will break if there are nested img tags. Hence, the advice to use a parser.

edited Jan 25 '18 at 11:55

Matthias Braun

32,039
22
142
171

answered Sep 25 '10 at 02:49

NullUserException

83,810
28
209
234

+1 just for the quote. although it's a parser, not an interpreter. – muhmuhten Sep 25 '10 at 03:03
1

@sre I knew I couldn't be the *only* one who hated BBCode with a passion. – NullUserException Sep 25 '10 at 03:06
16

bbcode is an ill-conceived, badly-designed, and generally poorly-implemented html knock-off. its sole redeeming quality is that it tends to be shorter than html. and of course, that's not hard to do. – muhmuhten Sep 25 '10 at 03:10
6

+1 BBCode is actually HTML with square brackets and synonyms. – BoltClock Oct 07 '10 at 01:17
`BBCode came to life when developers were too lazy to parse HTML correctly and decided to invent their own markup language.` It is not a solid argument but rather a subjective opinion. BBcodes have been there for a long time and are still used on forums. People know them. Instead, what in the world is Textile? I haven't heard of it. And I'm sure most people as well. Why reinvent a new bicycle if the old one is firmly doing its job? – user2513149 Jan 22 '18 at 20:54
What's the source of the quote? – Nathan Hinchey Mar 22 '19 at 20:55

score 8 · Accepted Answer · edited May 23 '17 at 10:33

irb(main):001:0> str = "some html code [img]......[/img] some html \
code [img]......[/img]"
"some html code [img]......[/img] some html code [img]......[/img]"
irb(main):002:0> str.scan(/\[img\].*?\[\/img\]/)
["[img]......[/img]", "[img]......[/img]"]

Keep in mind that this is a very specific answer that is based on your exact question. Change str by, say, adding an image tag within an image tag, and all Hell will break loose.

score 4 · Answer 3 · edited Apr 01 '16 at 13:44

4

There is a ruby BBCODE parser at Google Code.

Don't use regex for this.

edited Apr 01 '16 at 13:44

Willi Mentzel

27,862
20
113
121

answered Sep 24 '10 at 16:17

Tomalak

332,285
67
532
628

@square:: Hm, my reading is that with a parser you can create any output you like, be it HTML or a simple array. This parser is a suggestion only, there are others, I'm sure. Key point is: Your time is better spent figuring out how to use a parser for this than trying to do it with regex. Even if it seems the other way around at first. – Tomalak Sep 24 '10 at 21:22

score -1 · Answer 4 · answered Sep 25 '10 at 02:40

-1

str = "some html code [img]......[/img] some html code [img]......[/img]"
p str.split("[/img]").each{|x|x.sub!(/.*\[img\]/,"")}

answered Sep 25 '10 at 02:40

ghostdog74

327,991
56
259
343

Regex to split BBCode into pieces

4 Answers4

Linked