7

In part of my application I have an option that displays a list of albums by the current artist that aren't in the music library. To get this I call a music API to get the list of all albums by that artist and then I remove the albums that are in the current library.

To cope with the different casing of names and the possibility of missing (or extra punctuation) in the title I have written an IEqualityComparer to use in the .Except call:

var missingAlbums = allAbumns.Except(ownedAlbums, new NameComparer());

This is the Equals method:

public bool Equals(string x, string y)
{
    // Check whether the compared objects reference the same data.
    if (ReferenceEquals(x, y)) return true;

    // Check whether any of the compared objects is null.
    if (x is null || y is null)
        return false;

    return string.Compare(x, y, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0;
}

This is the GetHashCode method:

public int GetHashCode(string obj)
{
    // Check whether the object is null
    if (obj is null) return 0;

    // Make lower case. How do I strip symbols?
    return obj.ToLower().GetHashCode();
}

This fails, of course, when the string contains symbols as I'm not removing them before getting the hash code so the two strings (e.g. "Baa, baa, black sheep" and "Baa baa Black sheep") are still not equal even after converting to lower case.

I have written a method that will strip the symbols, but that meant I had to guess what those symbols actually are. It works for the cases I've tried so far, but I'm expecting it to fail eventually. I'd like a more reliable method of removing the symbols.

Given that the CompareOptions.IgnoreSymbols exists, is there a method I can call that will strip these characters from a string? Or failing that, a method that will return all the symbols?

I have found the IsPunctuation method for characters, but I can't determine whether what this deems to be punctuation is the same as what the string compare option deems to be a symbol.

ChrisF
  • 134,786
  • 31
  • 255
  • 325
  • Do you want to remove all symbols of a string? Keeping only letters? Eventually numbers too to allow *Element3*? Or only certains chars like comma and semilicon to keep for example *Simon & Garfunkel*? –  Jun 15 '21 at 22:29
  • 1
    @OlivierRogier I need to remove all the symbols that are ignored when you use the `CompareOptions.IgnoreSymbols` flag on `string.Compare`. `GetHashCode` has to return the same value for two strings that would be deemed the same by the comparison. – ChrisF Jun 15 '21 at 22:30
  • Related: [Can I obtain the result string used for comparisons with CompareOptions?](https://stackoverflow.com/q/23118292) Some good (underrated) answers there. Konrad's is basically what first popped into my mind: "I'll bet that API is calling down to a native Win32 API which you could call directly. Let someone else do the tracking of what is and is not punctuation." Perhaps the best answer is to P/Invoke [`LCMapStringEx`](https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex) (if it's not already wrapped) with the `NORM_IGNORESYMBOLS` flag. – Cody Gray - on strike Jun 15 '21 at 22:43
  • Wow, there's a lot of diamonds and rep floating around this question and its answer and comments. – Flydog57 Jun 15 '21 at 22:54
  • 1
    @Flydog57: just goes to show, us newbies can still teach the old dogs a few new tricks. ;) – Peter Duniho Jun 15 '21 at 23:07

1 Answers1

7

If you're going to use the CompareOptions enum, I feel like you might as well use it with the CompareInfo class that it's documented as being designed for:

Defines the string comparison options to use with CompareInfo.

Then you can just use the GetHashCode(string, CompareOptions) method from that class (and even the Compare(string, string, CompareOptions) method if you like).

Peter Duniho
  • 68,759
  • 7
  • 102
  • 136