0

i wish to match a multiple field value delimited by a colon in a single line, but each field and value text contains space e.g.

field1   :    value1a  value1b

answer
match1: Group1=field1, Group2=value1a value1b

or

field1   :    value1a  value1b   field2   : value2a value2b

answer
match1: Group1=field1, Group2=value1a value1b
match2: Group1=field2, Group2=value2a value2b

the best i can do right now is (\w+)\s*:\s*(\w+)

Regex regex = new Regex(@"(\w+)\s*:\s*(\w+)");
Match m = regex.Match("field1   :    value1a  value1b   field2   : value2a value2b");
while (m.Success)
{
   string f = m.Groups[1].Value.Trim();
   string v = m.Group2[2].Value.Trim();
}

i guess look ahead may help, but i don't know how to make it thank you

DayDayHappy
  • 1,679
  • 1
  • 15
  • 26

2 Answers2

3

You may try

(\w+)\s*:\s*((?:(?!\s*\w+\s*:).)*)
  • (\w+) group 1, any consecutive words
  • \s*:\s* a colon with any space around
  • (...) group 2
  • (?:...)* a non capture group, repeats any times
  • (?!\s*\w+\s*:). negative lookahead with a character ahead, the following character must not form a word surrounds by any space followed by a colon. Thus the group 2 never consumes any words before a colon

See the test cases

Hao Wu
  • 17,573
  • 6
  • 28
  • 60
  • thank you, it works. but i don't quite understand what is the final dot used for after the negative look ahead. – DayDayHappy May 06 '21 at 02:58
  • 1
    The dot just matches any character, but before it matches, a negative lookahead checkes if it's a part of the word before a colon. If it is, the whole match ends at that point. This lookahead checks every time when the dot matches a character. – Hao Wu May 06 '21 at 03:00
  • 1
    It works for uncertain number of values. If the number of values is fixed to 2. A simple [`(\w+)\s*:\s*(\w+\s+\w+)`](https://regex101.com/r/XUGlnS/1/) would do the trick – Hao Wu May 06 '21 at 03:04
0

You can use a regex based on a lazy dot:

var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");

See the C# demo online and the .NET regex demo (please mind that regex101.com does not support .NET regex flavor).

As you see, no need using a tempered greedy token. The regex means:

  • (\w+) - Group 1: any one or more letters/digits/underscore
  • \s*:\s* - a colon enclosed with zero or more whitespace chars
  • (.*?) - Group 2: any zero or more chars other than a newline, as few as possible
  • (?=\s*\w+\s*:|$) - up to the first occurrence of one or more word chars enclosed with zero or more whitesapces or end of string.

Full C# demo:

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var text = "field1   :    value1a  value1b   field2   : value2a value2b";
        var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
        foreach (Match m in matches)
        {
            Console.WriteLine("-- MATCH FOUND --\nKey: {0}, Value: {1}", 
                m.Groups[1].Value, m.Groups[2].Value);
        }
    }
}

Output:

-- MATCH FOUND --
Key: field1, Value: value1a  value1b
-- MATCH FOUND --
Key: field2, Value: value2a value2b
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563