2

I have a large set of strings, including many duplicates. It is important that all of the duplicates have the same casing. So this set would fail the test:

String[] strings = new String[] { "a", "A", "b", "C", "b" };

....but this test would pass:

String[] strings = new String[] { "A", "A", "b", "C", "b" };

As I iterate through each string in strings, how can my program see that A is a case-insensitive duplicate of a (and thus fail), but allow the duplicate b through?

Dai
  • 141,631
  • 28
  • 261
  • 374
  • 1
    Your title says `HashSet` and your question has `array`, is something missing ? – Habib Apr 29 '13 at 05:41
  • If you use `Equals` to check two strings for equality, there's an overload that takes in a `StringComparison` enum value. You might want to use `StringComparison.OrdinalIgnoreCase`. If you need to make a `HashSet` (or `Dictionary`) use a specific comparison, construct the instance of `HashSet<>` (etc.) using the instance constructor that takes in an `IEqualityComparer`. In this case you might want to use the comparer `StringComparer.OrdinalIgnoreCase`. – Jeppe Stig Nielsen Apr 29 '13 at 05:52

3 Answers3

4

One simple approach would be to create two sets - one using a case-insensitive string comparer, and one using a case-sensitive one. (It's not clear to me whether you want a culture-sensitive string or not, or in which culture.)

After construction, if the two sets has a different size (Count) then there must be some elements which are equal by case-insensitive comparison, but not equal by case-sensitive comparison.

So something like:

public static bool AllDuplicatesSameCase(IEnumerable<string> input)
{
    var sensitive = new HashSet<String>(input, StringComparer.InvariantCulture);
    var insensitive = new HashSet<String>(input, 
          StringComparer.InvariantCultureIgnoreCase);
    return sensitive.Count == insensitive.Count;
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • This will work because one comparer should be more "strict" than the other one, that is the logical implication _if_ `InvariantCulture` declares two strings to be equal, _then_ `InvariantCultureIgnoreCase` will also declare them equal. There are two major issues to consider before choosing this solution. First, this creates two full copies of the source `input`. If the `input` is huge, this might be a waste of memory. Second, this iterates through the entire source twice. If there is a "counter-example" early in the `input` list, some other solutions might exit fast and not continue iterating. – Jeppe Stig Nielsen Apr 29 '13 at 07:46
  • @JeppeStigNielsen: Agreed. I was definitely going for simplicity over everything else. – Jon Skeet Apr 29 '13 at 07:48
  • Also note: If you create a `HashSet` using the default `IEqualityComparer`, you get an ordinal comparison (equivalent to `StringComparer.Ordinal`). This is distinct from the invariant culture. For example the invariant culture considers `"ss"` and `"ß"` equal. – Jeppe Stig Nielsen Apr 29 '13 at 07:49
  • @JeppeStigNielsen: Indeed - that's why I specified comparers in both sets :) – Jon Skeet Apr 29 '13 at 09:06
0

You could check each entry explicitly.

static bool DuplicatesHaveSameCasing(string[] strings)
{
  for (int i = 0; i < strings.Length; ++i)
  {
    for (int j = i + 1; j < strings.Length; ++j)
    {
      if (string.Equals(strings[i], strings[j], StringComparison.OrdinalIgnoreCase)
        && strings[i] != strings[j])
      {
        return false;
      }
    }
  }
  return true;
}

Comment: I chose to use ordinal comparison. Note that != operator uses an ordinal and case-sensitive comparison. It is rather trivial to change this into some culture-dependent comparison.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
0

And another option using LINQ.

                    //Group strings without considering case
bool doesListPass = strings.GroupBy(s => s.ToUpper())
                    //Check that all strings in each group has the same case
                    .All(group => group.All(s => group.First() == s));

                    //Group strings without considering case
IEnumerable<string> cleanedList = strings.GroupBy(s => s.ToUpper())
                    //Check that all strings in each group has the same case
                    .Where(group => group.All(s => group.First() == s))
                    //Map all the "passing" groups to a list of strings 
                    .SelectMany(g => g.ToList());

Note: You can use ToUpper() or ToUpperInvariant() depending on your need.

Steven Wexler
  • 16,589
  • 8
  • 53
  • 80