1

I am currently trying to extract values from a string and construct a URL that includes those values. I went through a dozen regex question, but I am not quite satisfied with the answers.

I have custom encoded strings with more than one information and I want to construct a new URL that contains those information.

For example 35afe06d-8393-4559-b6d7-74d35ce131d8|Master should become http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master. My first assumption was

var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master"
var pattern = @"((?:[a-f0-9]+-?){5})|(\w+)"
var replacement = "http://my-server/media/guid/$1?v=$2"
var output = Regex.Replace(input, pattern, replacement)

However this replaces each group with the full URL. Limitation is, that I am not aware of input, pattern, replacement or output. pattern and replacement are two config values and I don't want to make it x pairs of config values, input comes from somewhere else in the application and could have any custom encoding (pipe, colon, ...) output depends on the use case. It can have any number of groups in the pattern and doesn't even have to be a URL in the end.

I can think of different ways to do this, like parsing the string myself, or trying to create a replacement dictionary, or using regex to find the groups and then string replace for $1 => match.Groups[0]. I just feel like there must be an obvious 1-liner solution for that in .NET since I even remember doing that in PHP.

Answer: It's not a .NET limitation, it was simply the unescaped pipe.

Toxantron
  • 2,218
  • 12
  • 23

2 Answers2

1

In your pattern (([a-f0-9]+-?){5})|\w+ the second group should be capturing the word characters after the pipe (escape the pipe to match it literally).

If you repeat this part ([a-f0-9]+-?) 5 times, the match could also end on a hyphen.

To match the values separated by the dash, you could match the character class [a-f0-9]+ and repeat matching that {4} times prepended by a -

([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)

.NET Regex demo | C# demo

enter image description here

var input = "35afe06d-8393-4559-b6d7-74d35ce131d8|Master";
var pattern = @"([a-f0-9]+(?:-[a-f0-9]+){4})\|(\w+)";
var replacement = "http://my-server/media/guid/$1?v=$2";
var output = Regex.Replace(input, pattern, replacement);
Console.WriteLine(output);

Result

http://my-server/media/guid/35afe06d-8393-4559-b6d7-74d35ce131d8?v=Master
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I'll be damned - my mistake was not escaping the damn pipe from my example. Thanks, I just spent 30min researching Regex.Replace docs and SO. – Toxantron Jul 17 '19 at 15:54
  • In the first version of your pattern the second capturing group was not at the right place as well and the pattern would also match ending on a `-`. – The fourth bird Jul 17 '19 at 15:58
  • 1
    Yes, those were from creating an abstract SO question, BUT the real error was the unescaped pipe. I edited my question but left the pipe. – Toxantron Jul 17 '19 at 16:00
0

This expression might also work here:

^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

Test

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s*\|\s*(.*?)\s*$";
        string substitution = @"http://my-server/media/guid/\1?v=$2";
        string input = @"35afe06d-8393-4559-b6d7-74d35ce131d8|Master
35afe06d-8393-4559-b6d7-74d35ce131d8|  Master  ";
        RegexOptions options = RegexOptions.Multiline;

        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
    }
}

Reference

Searching for UUIDs in text with regex

Emma
  • 27,428
  • 11
  • 44
  • 69