4

As part of some data migration I'm converting user content from bbcode to markdown.

I'm doing the conversion in a custom MySQL 8.0.22 function, which contains the following line to convert bbcode [url] tags to markdown:

...
SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i');
...

This performs as expected, for example:

[url=https://stackoverflow.com/]SO[/url] converts corectly to [SO](https://stackoverflow.com/)

The problem is: some of the URLs contain spaces, which isn't valid markdown and isn't displaying correctly on my client-side.

Is it possible to modify my REGEXP_REPLACE statement to replace spaces in the links with "%20"?

I'd like to do it all in MySQL if possible, but can do the processing on each record in C# if necessary.


For completeness, my entire bbcode to markdown function is:

CREATE DEFINER=`root`@`localhost` FUNCTION `func_bb_to_md`(bbcode MEDIUMTEXT) RETURNS mediumtext CHARSET utf8mb4
    DETERMINISTIC
BEGIN
    DECLARE markdown MEDIUMTEXT;
    SET markdown = bbcode;
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[code\\](.*?)\\[\\/code\\]', '`$1`', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[url\\](.*?)\\[\\/url\\]', '<$1>', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[url=([^\\]]+)\\](.*?)\\[\\/url\\]', '[$2]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[img\\](.*?)\\[\\/img\\]', '![]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[yt\\](.*?)\\[\\/yt\\]', '![]($1)', 1, 0, 'i'); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[b\\](.*?)\\[\\/b\\]', '**$1**', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[i\\](.*?)\\[\\/i\\]', '*$1*', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[u\\](.*?)\\[\\/u\\]', '$1', 1, 0, 'i'); 
    
    SET markdown = REPLACE(markdown, '[list]', ''); 
    SET markdown = REPLACE(markdown, '[list=1]', ''); 
    SET markdown = REPLACE(markdown, '[/list]', ''); 
    SET markdown = REPLACE(markdown, '[*]', '* '); 
    
    SET markdown = REGEXP_REPLACE(markdown, '\\[color=([^\\]]+)\\](.*?)\\[\\/color\\]', '$2', 1, 0, 'i'); 
    SET markdown = REGEXP_REPLACE(markdown, '\\[quote\\](.*?)\\[\\/quote\\]', '> $1', 1, 0, 'i'); 
    
    SET markdown = REPLACE(markdown, ':)', '{{slightly_smiling_face}}');
    SET markdown = REPLACE(markdown, ';)', '{{wink}}');
    SET markdown = REPLACE(markdown, ':D', '{{grin}}');
    SET markdown = REPLACE(markdown, ':P', '{{stuck_out_tongue}}');
    SET markdown = REPLACE(markdown, ':(', '{{frowning_face}}');
    SET markdown = REPLACE(markdown, ':''(', '{{cry}}');
    SET markdown = REPLACE(markdown, ':.', '{{flushed}}');
    SET markdown = REPLACE(markdown, ':|', '{{neutral_face}}');
    SET markdown = REPLACE(markdown, ':O', '{{open_mouth}}');
    SET markdown = REPLACE(markdown, ':@', '{{angry}}');
    SET markdown = REPLACE(markdown, ':S', '{{confused}}');
    SET markdown = REPLACE(markdown, ':$', '{{blush}}');
    SET markdown = REGEXP_REPLACE(markdown, '\\{\\{(.*?)\\}\\}', ':$1:', 1, 0, 'i');
    
    SET markdown = REPLACE(markdown, '\r\n', '\n');
    SET markdown = REPLACE(markdown, '\n', '\r\n');

    RETURN markdown;
END
andrewzuku
  • 43
  • 4
  • In C#, that would be `Regex.Replace(markdown, @"(?i)\[url=([^]]+)](.*?)\[/url]', x => $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ", "%20")})")`. Perhaps, instead of using `.Replace`, you might want `Uri.EscapeDataString(x.Groups[1].Value)` (see [here](https://stackoverflow.com/a/1517709/3832970)). – Wiktor Stribiżew Dec 19 '20 at 19:57
  • I'll give that a go if there's no way to do it in MySQL, thanks :) – andrewzuku Dec 19 '20 at 20:19

1 Answers1

2

In C#, you may use Regex.Replace with a match evaluator to manipulate the captured texts. In MySQL REGEXP_REPLACE, you do not have this option.

So, you can use

var markdown = "[url=https://stackoverflow.com/a b]SO[/url]";
var p = @"(?i)\[url=([^]]+)](.*?)\[/url]";
var result = Regex.Replace(markdown, p, x => 
        $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})");
Console.WriteLine(result);

See this C# demo online.

The (?i)\[url=([^]]+)](.*?)\[/url] regex matches (in a case insensitive way) [url=, captures any one or more chars other than ] into Group 1, then [/url] substring. The match is passed to the match evaluator, x is a match object. $"[{x.Groups[2].Value}]({x.Groups[1].Value.Replace(" ","%20")})") does all the necessary manipulation.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    I was fairly sure there was no way to do it purely in MySQL, so I'll use this approach to convert the bbcode links to valid URLs, before converting to markdown. Thanks! – andrewzuku Dec 19 '20 at 20:29