0

I'm trying to create a simple regular expression in C# to split a string into tokens. The problem I'm running into is that the pattern I'm using captures an empty string, which throws off my expected results. What can I do to change my regular expression so it doesn't capture an empty string?

var input = "ID=123&User=JohnDoe";
var pattern = "(?:id=)|(?:&user=)";
var tokens = Regex.Split(input, pattern, RegexOptions.IgnoreCase);

// Expected Results
// tokens[0] == "123"
// tokens[1] == "JohnDoe"

// Actual Results
// tokens[0] == ""
// tokens[1] == "123"
// tokens[2] == "JohnDoe"
Halcyon
  • 14,631
  • 17
  • 68
  • 99
  • 1
    See [Easiest way to parse “querystring” formatted data](https://stackoverflow.com/questions/11956948/easiest-way-to-parse-querystring-formatted-data). – Wiktor Stribiżew Sep 22 '17 at 20:39
  • link doesn't answer OP - it relies on a System.Net.Http static method that doesn't exist in .NET core – Josh E Sep 22 '17 at 20:41
  • 1
    @WiktorStribiżew definitely points to a better approach. Your query string would fail on semantically identical input `"User=JohnDoe&ID=123"` because of your check for an `&` in the regex. It's best not to reinvent the wheel on this one. – Corey Ogburn Sep 22 '17 at 20:43
  • There several ways to fix this. 1) [Remove empty items](https://stackoverflow.com/questions/4912365/c-sharp-regex-split-removing-empty-results), 2) Use `(?i)(?<=id=)[^&]+` to get id and `(?i)(?<=user=)[^&]+` to get user name, 3) etc. – Wiktor Stribiżew Sep 22 '17 at 20:46

2 Answers2

2

While the comments to your OP regarding using a different approach may have merit, they don't address your specific question regarding the RegEx behavior.

I think that the reason though you're getting the regex behavior has to do with an implicit capture group (ed: or it could just be limiting the capture behavior of the first group is sufficient), but I haven't made it to the top level of the RegEx hierarchy of understanding.

Edit:

Working RegEx for the given test case:

(?>id=)|(?:&user=)

If none of this is to your liking, you could always tack a predicate to the tokens list:

tokens.Where(x => !string.IsNullOrWhiteSpace(x))

Josh E
  • 7,390
  • 2
  • 32
  • 44
0

I don't think you can solve this problem with Regex.Split to be honest. One brute force way to do this is to remove every "":

var input = "ID=123&User=JohnDoe";
var pattern = "(?:id=)|(?:&user=)";
var tokens = Regex.Split(input, pattern, RegexOptions.IgnoreCase).Where(x => x != "");

I think you should use regex that actually captures the tokens in groups.

var input = "ID=123&User=JohnDoe";
var pattern = "id=(.+)&user=(.+)";
var match = Regex.Match(input, pattern, RegexOptions
    .IgnoreCase);
match.Groups[1] // 123
Sweeper
  • 213,210
  • 22
  • 193
  • 313