0

The scenario:

We have on our .Net Core 3.1 application many file paths (of windows OS) that we need to compare and store in Hashsets and Dictionaries that use string comparison (case insensitive).

I tried to find an easy way to handle all these file paths strings and I found 3 options:

Option 1:

Use StringComparison.OrdinalIgnoreCase and use StringComparer.OrdinalIgnoreCase.

  • Advantage: No need to implement a new class, just use the existing framework.
  • Big Disadvantage: Very easy to forget these while developing new code. (We have many strings in the application that are not case insensitive)

Option 2:

Create a IgnoreCaseString class that wraps the string class and overrides Equals, GetHashCode and operator == with StringComparison.OrdinalIgnoreCase.

public sealed class IgnoreCaseString
{
    public IgnoreCaseString(string originalString)
    {
        this.OriginalString = originalString;
    }

    public string OriginalString { get; }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(this, obj))
            return true;

        if (!(obj is IgnoreCaseString filePathString))
            return false;

        return OriginalString.Equals(filePathString.OriginalString, StringComparison.OrdinalIgnoreCase);
    }

    public static bool operator ==(IgnoreCaseString a, IgnoreCaseString b)
    {
        return Equals(a, b);
    }

    public static bool operator !=(IgnoreCaseString a, IgnoreCaseString b)
    {
        return !Equals(a, b);
    }

    public override int GetHashCode()
    {
        return OriginalString.GetHashCode(StringComparison.OrdinalIgnoreCase);
    }
}
  • Advantage: Very optimize, use the existing string methods, and easy to use.
  • Disadvantage: Need to maintain the IgnoreCaseString class.

Option 3:

Use Uri class which is designated for it.

  • Advantage: Existing class, easy to use.
  • Disadvantage: Memory and performance overhead.

The question:

  • Is there any built-in or easy way to do it?
  • Is there another option that I miss?
itaiy
  • 1,152
  • 1
  • 13
  • 22
  • 1
    Note that case sensitivity on file names depends on the OS, not the application. On Windows, it would be fine to ignore the case, but on Linux you must preserve and compare it without conversions. Why do you even care about this? – Alejandro Jul 29 '20 at 14:34
  • You can use a code analyzer (like e.g. FxCop or StyleCop or create your own) - they run both on build and on writing code, so you won't "forget" to use `StringComparison.OrdinalIgnoreCase`. – dymanoid Jul 29 '20 at 14:35
  • 1
    If you want speed, I would store the all-uppercase or all-lowercase versions of the paths along with their proper capatilization (if needed). Comparing without string sensitivity is going to be faster and you can be sure what the key will be in a hashset or dictionary. – itsme86 Jul 29 '20 at 14:35
  • #2 is already available through the [StringComparer-derived classes](https://learn.microsoft.com/en-us/dotnet/api/system.stringcomparer?view=netcore-3.1) – Panagiotis Kanavos Jul 29 '20 at 14:37
  • I would go with the third option, because is most likly knows how to handle the case depeding on the OS the app is running on, making your program easier to port to different OSs. – Ackdari Jul 29 '20 at 14:39
  • @itsme86 how so? Dictionary works through comparers and allows [specifying a different comparer](https://learn.microsoft.com/en-us/dotnet/api/system.stringcomparer?view=netcore-3.1) in the constructor. Instead of the case sensitive one, you can use a case-insensitive comparer – Panagiotis Kanavos Jul 29 '20 at 14:39
  • @PanagiotisKanavos You can, but it will be slower. – itsme86 Jul 29 '20 at 14:40
  • @PanagiotisKanavos comparing string with case-sensitivity is always faster then comparing them with case-insensitivity. Because in the first case the algorithm can simply compare the value of the chars but in the second case different char values does not automatically mean unequal. This is especially important for the case-insensitivity comparison of non English Alphabets because some chars map to mulitpl char in upper or lower case. That is why in rust [`to_uppercase`](https://doc.rust-lang.org/std/primitive.char.html#method.to_uppercase) returns an iterator and not a char. – Ackdari Jul 29 '20 at 14:45
  • This isn't Rust, and converting to uppercase allocates a new string. Converting every value to uppercase before trying to retrieve an item can be more expensive than any benefits from using the culture-sensitive `String.Equals` that's going to be used by `EqualityComparer.Default` – Panagiotis Kanavos Jul 29 '20 at 14:46
  • Have you looked at the `Dictionary`. It has a constructor that takes an `IEqualityComparer`. You can create a case independent Dictionary – Flydog57 Jul 29 '20 at 14:46
  • @PanagiotisKanavos Hence the "if you want speed" qualifier. This is why there's no one best way to do things. We don't know what the OP's constraints or priorities are. Is it write once read many? Write once, read once? How many file paths? I'm not going to worry about an additional 1MB of memory because this isn't 1990 anymore. Lots of factors to consider when choosing a solution, so I was providing an option that no one else had presented yet in case that worked best for OP. Not sure why you're trying to shoot it down so hard. – itsme86 Jul 29 '20 at 14:54
  • @itsme86 allocations are CPU operations too (so are deallocations, but those are payed "later"). It's not about RAM, it's about ending up using more CPU instead of less. Besides [it looks like someone asked something similar already](https://stackoverflow.com/questions/2256453/using-invariantcultureignorecase-instead-of-toupper-for-case-insensitive-string) and the answer was ... the Turkish `I`. If you use `ToUpper()` you won't know what you end up with. – Panagiotis Kanavos Jul 29 '20 at 14:59
  • @PanagiotisKanavos We already know that. But if you're storing it once and reading it from it several times, it's a CPU operation win, not a loss. Options are good. Stop trying to supress. – itsme86 Jul 29 '20 at 15:01
  • @itsme86 I'm not trying to suppress. I'm not talking about the stored keys, I'm talking about the *lookup values*. And I've encountered such bugs in the past, even in VS itself - a Turkish programmer asked about an early Roslyn Camel-case refactoring that produced invalid variable names. I was the guy that posted the issue then had to find support from non-Americans in MS to persuade the product manager that no, culture sensitive casing in variable names is not required, so closing `as-designed` was wrong. Took a while – Panagiotis Kanavos Jul 29 '20 at 15:05

1 Answers1

1

file paths that we need to compare and store

That already screams case sensitivity. Only Windows has case insensitive paths, all other modern operating systems (for the better or worse) are case sensitive. You should play it safe and stick to case sensitivity.

In fact I'd challenge you to think of how you would get a path of the wrong casing on any OS -- manual entry from a spreadsheet or something? In that case I'd add a "Browse" button that opens a standard open file dialog and gives you the correct string.

#1, Use StringComparison.OrdinalIgnoreCase and use StringComparer.OrdinalIgnoreCase.

It's an okay solution if you ignore the first part. Anything about "forgetting" things can usually be done with custom code analysis modules, and it's fast and memory efficient by itself.

#2, Create a IgnoreCaseString class that wraps the string class and overrides Equals, GetHashCode and operator == with StringComparison.OrdinalIgnoreCase.

Perhaps a better name would be CaseInsensitiveString, as Ignore implies an active action being performed, while your string doesn't perform any actions, it just passively behaves differently.

This solution uses more memory than the previous one, and it doesn't let you use modern constructs like Span for less memory overhead in tight string processing scenarios.

Your implementation is also broken, re-read your operators:

public static bool operator ==(IgnoreCaseString a, IgnoreCaseString b)
{
    return string.Equals(a, b);
}

#3, Use Uri class which is designated for it.

It's meant for URI protocol paths, of which file:// is one of. It's not meant for local disk paths, though of course it can encode them with the file:// handler. I agree with you, seems overkill.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Blindy
  • 65,249
  • 10
  • 91
  • 131
  • Repeating wrong information doesn't magically make you right. You said that before. – Blindy Jul 29 '20 at 15:09
  • Thank you for your answer. 1) I've fixed the `operator ==` and I've added to the question that the application is relevant only to windows OS. 2) How can I write code analysis that only part of the strings are file paths? 3) About `Span`, I'm still exposing the Original string, so I can use it. – itaiy Jul 30 '20 at 05:47
  • 2) naming convention, perhaps? That's how ASP.NET MVC Core does it. 3) You can use it, but it won't follow your case insensitive rules when doing comparisons with `Span`. That only makes it worse. – Blindy Jul 30 '20 at 14:41