If we run the following (thanks to @octavioccl for help) LINQ Query:
var result = stringsList
.GroupBy(s => s)
.Where(g => g.Count() > 1)
.OrderByDescending(g => g.Count())
.Select(g => g.Key);
It gives us all the strings which occur in the list atleast twice (but exactly matched i.e. Hamming Distance =0).
I was just wondering if there is an elegant solution (all solutions I have tried so far either use loops and a counter which is ugly or regex) possible where we can specify the hamming distance in the Where
clause to get those strings as well which lie within the specified Hamming Distance range?
P.S: All the strings are of equal length
UPDATE
Really thanks to krontogiannis for his detailed answer. As I mentioned earlier, I want to get list of strings with hamming distance below the given threshold. His code is working perfectly fine for it (Thanks again).
Only thing remaining is to take the strings out of the 'resultset' and insert/add into a `List'
Basically this is what I want:
List<string> outputList = new List<string>();
foreach (string str in patternsList)
{
var rs = wordsList
.GroupBy(w => hamming(w, str))
.Where(h => h.Key <= hammingThreshold)
.OrderByDescending(h => h.Key)
.Select(h => h.Count());
outputList.Add(rs); //I know it won't work but just to show what is needed
}
Thanks