Using htmlentities with BBCode

Question

What I am trying to achieve is a sound method for using BBCode but where all other data is parsed through htmlentities(). I think that this should be possible, I was thinking along the lines of exploding around [] symbols, but I thought there may be a better way.

Any ideas?

What sort of data appears between the brackets? IIRC, htmlspecialchars only does angle brackets (>, <), ampersands (&) and quotes (", '), so you should be fine if they don't appear within your angle brackets. — alex, Jun 23 '09 at 01:57

score 0 · Answer 1 · answered Jun 23 '09 at 09:22

htmlentities() does not parse. Rather, it encodes data so it can be safely displayed in an HTML document.

Your code will look like this:

Parse BB-code (by some mechanism); don't do escaping yet, just parse the input text into tags!
The output of your parser step will be some tree structure, consisting of nodes that represent block tags and nodes that represent plain text (the text between the tags).
Render the tree to your output format (HTML). At this point, you escape plain text in your data structure using htmlentities.

Your rendering function will be recursive. Some pseudo-functions that specify the relationship:

render( x : plain text ) = htmlentities(x)

render( x : bold tag )   = "<b>" . render( get_contents_of ( x )) . "</b>"

render( x : quote tag )  = "<blockquote>" . 
                           render( get_contents_of( x )) .
                           "</blockquote>"

...

render( x : anything else) = "<b>Invalid tag!</b>"

So you see, the htmlentities only comes into play when you're rendering your output to HTML, so the browser does not get confused if your plain-text is supposed to contain special characters such as < and >. If you were rendering to plain text, you wouldn't use the function call at all, for example.

Using htmlentities with BBCode

1 Answers1