1

I have a list of strings that contain some duplicates. They are not EXACT duplicates as some contain spaces in different locations. Example of a list:

best shoes for flat feet
bestshoes for flat feet
best shoesfor flatfeet
best shoes for flatfeet

Now what I would like to do is remove all these duplicate strings, keeping only the one with the MOST spaces (we will assume this is the correct spacing).

Can anyone recommend me a way to accomplish this?

Zach Johnson
  • 2,047
  • 6
  • 24
  • 40
  • 1
    When you are facing the problem you never solved before - try to solve it in the most naive way. Say, exactly the same way you would solve it if you were given 200 such strings and a piece of paper. – zerkms Sep 17 '16 at 01:29
  • How do you resolve ties, when the same number of spaces appears at different places? E.g. `a..b.c` vs. `a.b..c` (I use dots in place of spaces to make them visible). Which one would you like to pick? – Sergey Kalinichenko Sep 17 '16 at 01:32
  • I would make a class that holds your string as a prop, and make a list of lists of ints, each list of ints is the indices of space locations of one mutation of your string, but I think it'll be beneficial for a large number of mutations of your string – Jorayen Sep 17 '16 at 01:35
  • Naively, you simply said "Give me the longest string of a list of strings"...are you SURE that is what you desire? THAT seems easy – Mark Schultheiss Sep 17 '16 at 01:43
  • You're right when you look at it that way that's all it is. I was overthinking something extremely simple it seems. – Zach Johnson Sep 17 '16 at 01:46

1 Answers1

3
  • Start by constructing a "canonical" version from each string by removing all spaces (here is how to do it)
  • Use canonical version as a key to group your strings
  • Pick the longest string among the ones in the same group

You can do it with LINQ's GroupBy:

var res = orig
    .GroupBy(s => Regex.Replace(s, @"\s+", ""))
    .Select(g => g.OrderByDescending(s => s.Length).First())
    .ToList();
Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    `g.OrderByDescending(x => x.Count(c => c==' ')).First()` would be more expressive (as it matches exact requirement), but indeed it is the same as `s => s.Length` as the difference is only in spaces. – Alexei Levenkov Sep 17 '16 at 01:47