1

I have the need to parse a comma separated list of numbers and number ranges. The strings are entered into a UI by the user and will look something like one of these (six different samples inputs):

1-3, 5, 7-10
1
21.1
1.2-3,5.1,7-10.1
1-3, 5.1, 7-10, 21
1.1-3.1,5.1,7.1-10.1

My end goal is have a collection of number and number ranges that I can process later downstream. For example, after parsing the first string sample above my end result would be a collection that contains 3 elements: 1-3, 5 and 7-10.

Using C# and a .NET Regex this pattern nicely fills the Matches collection with just the items I need (Note the use of non-capturing groups):

(\d+(?:\.\d+)?-\d+(?:\.\d+)?)|(\d+(?:\.\d+)?)

I have two questions though:

  1. Do I need all of that in my pattern, or is there a more brief pattern possible?

  2. Is there something I can add to the pattern to return 0 matches when there are invalid characters contained in the string? For example if I include an alpha character in the string anywhere I would want no matches to occur. Right now I do this with two passes, one to validate that the string only has valid characters [\d,.- ], and another pass to get the matches assuming it validated in the first pass.

Thanks in advance for your ideas.

Update:

Here's the solution I ended up going with (see @Xiaoy312 answer):

public static IEnumerable<DataRange> ParseInput(string input)
{
    if (!Regex.IsMatch(input.Replace(" ", string.Empty), @"^[\d\.,\-]+$"))
        return Enumerable.Empty<DataRange>();

    return Regex.Matches(input.Replace(" ", string.Empty), 
        @"(?<A>\d+(?:\.\d+)?)(?:-(?<B>\d+(?:\.\d+)?))?").Cast<Match>()
        .Select(m => new DataRange
        {
            A = double.Parse(m.Groups["A"].Value,
                 System.Globalization.CultureInfo.InvariantCulture),
            B = m.Groups["B"].Success ? double.Parse(m.Groups["B"].Value, 
                 System.Globalization.CultureInfo.InvariantCulture) : (double?)null
        });

}

public class DataRange
{
    public double A;
    public double? B;
}

Here's sample usage:

static void Main(string[] args)
{
    Console.WriteLine("A\tB");
    var items = ParseInput("1");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("21.1");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("1-3,5,7-10");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("1.2-3,5.1,7-10.1");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("1-3, 5.1,  7-10,21");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("1.1-3.1,5.1,7.1-10.1");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
    items = ParseInput("1.1-3.1,5.1,7.1-10.1a");
    Array.ForEach(items.ToArray(), i => Console.WriteLine("{0}\t{1}", i.A, i.B));
}

Sample output:

A       B
1
21.1
1       3
5
7       10
1.2     3
5.1
7       10.1
1       3
5.1
7       10
21
1.1     3.1
5.1
7.1     10.1
KyleLib
  • 774
  • 3
  • 9
  • 26
  • 3
    What are you doing that couldn't be accomplished with two or three [`String.Split()`](http://msdn.microsoft.com/en-us/library/system.string.split%28v=vs.110%29.aspx) calls? – valverij Jan 07 '15 at 18:09
  • Something along the lines of `\d*(\.\d*)?(-(\d*(\.\d*)?))?` should be a valid simplification. To validate the whole thing you'd have to wrap that regex in a capture group inside another one that includes the commas and optional whitespace. – Ant P Jan 07 '15 at 18:19
  • 1
    I'd rather say `\d+(?:\.\d+)?(?:-\d+(?:\.\d+)?)?`. Use the `0` capture group (the whole match). – Lucas Trzesniewski Jan 07 '15 at 18:21
  • 1
    Saying that, you probably are better off with just a `String.Split` and `double.TryParse`. – Ant P Jan 07 '15 at 18:25
  • @valverij, Nothing out of the ordinary. I'm comfortable enough with regex that I chose that route. – KyleLib Jan 07 '15 at 18:27

1 Answers1

2

First rushed attempt at it :

public IEnumerable<object> ParseInput(string input)
{
    return Regex.Matches(input.Replace(" ", string.Empty), @"(?<A>\d+(\.\d+)?)(-(?<B>\d+(\.\d+)?))?").Cast<Match>()
        .Select(m => new
        { 
            A = m.Groups["A"].Value,  
            B = m.Groups["B"].Value
        });
}

Fixed :

public IEnumerable<DataRange> ParseInput(string input)
{
    if (!Regex.IsMatch(input.Replace(Environment.NewLine, string.Empty), @"^[\d\.,\- ]+$"))
        return Enumerable.Empty<object>();

    return input
        .Replace(" ", string.Empty)
        .Split(new[] { Environment.NewLine, "," }, StringSplitOptions.RemoveEmptyEntries)
        .Select(x => Regex.Match(x, @"(?<A>\d+(\.\d+)?)(-(?<B>\d+(\.\d+)?))?"))
        .Select(m => new DataRange
        {
            A = double.Parse(m.Groups["A"].Value, System.Globalization.CultureInfo.InvariantCulture),
            B = m.Groups["B"].Success ? double.Parse(m.Groups["B"].Value, System.Globalization.CultureInfo.InvariantCulture) : (double?)null
        });
}

public class DataRange
{
    public double A;
    public double? B;
}

Input :

    const string SampleInput = 
    @"1-3, 5, 7-10, 
1
21.1
1.2-3,5.1,7-10.1
1-3, 5.1, 7-10, 21
1.1-3.1,5.1,7.1-10.1";

Output :

A B
1 3 
5 null 
7 10 
1 null 
21,1 null 
1,2 3 
5,1 null 
7 10,1 
1 3 
5,1 null 
7 10 
21 null 
1,1 3,1 
5,1 null 
7,1 10,1 
Xiaoy312
  • 14,292
  • 1
  • 32
  • 44
  • Thanks!!! Your "First rushed attempt" was plenty, it got me going. Then I saw your "Fixed" example which was above and beyond...thanks for that. FYI, there's really no need to parse out newlines, as each line in my original set of samples was an input, i.e. six separate inputs in all, each parsed independently of the other. – KyleLib Jan 07 '15 at 19:04
  • Oh, I thought it was one single input. Haha. – Xiaoy312 Jan 07 '15 at 19:06