If you fancy an approach with Linq, you can try adding a named capture group to the regex, then filter the items that match the regex, group by the captured number and finally get only the first string for each number. I like the readability of the solution but I wouldn´t be surprised if there is a more efficient way of eliminating the duplicates, let´s see if somebody else comes with a different approach.
Something like this:
list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First())
You can give it a try with this sample:
public class Program
{
private static readonly Regex regex = new Regex(@"^(?<num>\d+)\.", RegexOptions.Compiled);
public static void Main()
{
var list = new [] {
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"
};
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First());
distinctWithNumbers.ToList().ForEach(Console.WriteLine);
Console.ReadKey();
}
}
You can try the approach it in this fiddle
As pointed by @orad in the comments, there is a Linq extension DistinctBy()
in MoreLinq that could be used instead of grouping and then getting the first item in the group to eliminate the duplicates:
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.DistinctBy(s => regex.Match(s).Groups["num"].Value);
Try it in this fiddle
EDIT
If you want to use your comparer, you need to implement the GetHashCode
so it uses the expression as well:
public int GetHashCode(T obj)
{
return _expr.Invoke(obj).GetHashCode();
}
Then you can use the comparer with a lambda function that takes a string and gets the number using the regex:
var comparer = new GenericCompare<string>(s => regex.Match(s).Groups["num"].Value);
var distinctWithNumbers = list.Where(s => regex.IsMatch(s)).Distinct(comparer);
I have created another fiddle with this approach.
Using lookahead regex
You can use any of these 2 approaches with the regex @"^\d+(?=\.)"
.
Just change the lambda expressions getting the "num" group s => regex.Match(s).Groups["num"].Value
with a expression that gets the regex match s => regex.Match(s).Value
Updated fiddle here.