3

Here is my problem. It's probably a simple fix. I have a regex that I am using to replace a url BBCode. What I have right now that is not working looks like this.

<?php
$input_string = '[url=www.test.com]Test[url]';
$regex = '/\[url=(.+?)](.+?)\[\/url]/is';
$replacement_string = '<a href="$1">$2</a>';
echo preg_replace($regex, $replacement_string, $input_string);
?>

This currently outputs the original $input_string, while I would like it to output the following.

<a href="www.test.com">Test</a>

What am I missing?

dqhendricks
  • 19,030
  • 11
  • 50
  • 83
  • 2
    Your Reyes is looking for `/url` yet your demo doesn't include it. Is that a typo? – Brad Christie Jan 01 '11 at 23:43
  • 2
    What you are missing is that you should not parse structured languages with regular expressions. There are bbcode parsers for PHP available. – Tomalak Jan 01 '11 at 23:43
  • asside from the fact that there are bbcode PECL extensions available, are there any other reasons that i should not use regex to do this? – dqhendricks Jan 01 '11 at 23:54
  • [This](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) rather famous answer is the reason - as @Tomalak said, "you should not parse structured languages with regular expressions". If regex works for you in your situation, then it's fine, but regexes aren't generally a robust tool to use to parse "structured languages". – thirtydot Jan 01 '11 at 23:57
  • ha. okay, but I am not parsing the HTML itself...which this answer seems to be addressing. – dqhendricks Jan 02 '11 at 00:09
  • i can see a security hole however where the poster could slip js into the anchor tag. the final regex will have to be a bit more complicated. – dqhendricks Jan 02 '11 at 00:13
  • That is why using a regex for this kind of thing makes it hard: when you have to support corner cases and think about all the subtle ways your input could change. – thirtydot Jan 02 '11 at 00:51
  • changed the url grabbing regex to use [^"] instead of a period, so that the user cannot add any JavaScript to the anchor tag. – dqhendricks Jan 02 '11 at 00:59
  • @dqhendricks: That *might* stop JavaScript (what about single quotes?), but try running your code against this link: http://www.google.com/search?q="nice%20try" – thirtydot Jan 02 '11 at 01:18
  • single quotes won't break the double quoted string. if they did, I would replace with [^"|'] – dqhendricks Jan 02 '11 at 05:12
  • not sure what that google search is supposed to link me to? – dqhendricks Jan 02 '11 at 05:13
  • 1
    The point is, for every sophisticated smug regex you come up with, there will be a comparatively easy attack to knock it over. The problem of parsing bbcode is solved, why do want to roll you own parser? Bit-by-bit your regexes will become bloated, brittle and unmaintainable monsters with every corner case you work into them, and in the end it still won't be 100% safe. Apart from that, in terms if grammar complexity, bbcode is just like HTML, only with square brackets. – Tomalak Jan 02 '11 at 09:14
  • can the PECL BBCode parser do something like this? '[code]blah[/code]' = '

    '.highlight_string('blah').'

    '?
    – dqhendricks Jan 02 '11 at 16:44
  • and if so, can you prevent it from parsing anything between the code tags? – dqhendricks Jan 02 '11 at 17:44

2 Answers2

3
<?php
$input_string = '[url=www.test.com]Test[/url]';
$regex = '/\[url=(.+?)\](.+?)\[\/url\]/is';
$replacement_string = '<a href="$1">$2</a>';
echo preg_replace($regex, $replacement_string, $input_string);
?>
  • In your BBCode string, I closed the [url] properly.
  • I escaped a ] in the regex (not sure if that was an actual problem).

Note that [url]http://example.org[/url] is also a valid way to make a link in BBCode.

You should listen to the comments suggesting you use an existing BBCode parser.

thirtydot
  • 224,678
  • 48
  • 389
  • 349
0

Change this line as follows: $regex = '/[url=(.+?)](.+?)[url]/is';

OK, the formatting is not proper. While I figure it out, see this: http://pastebin.com/6pF0FEbA

Sujith Surendranathan
  • 2,569
  • 17
  • 21