8

This question is not about managing Windows pathnames; I used that only as a specific example of a case-insensitive string. (And I if I change the example now, a whole bunch of comments will be meaningless.)


This may be similar to Possible to create case insensitive string class?, but there isn't a lot of discussion there. Also, I don't really care about the tight language integration that string enjoys or the performance optimizations of System.String.

Let's say I use a lot of Windows pathnames which are (normally) case-insensitive (I'm not actually concerned with the many details of actual paths like \ vs. /, \\\\ being the same as \, file:// URLs, .., etc.). A simple wrapper might be:

sealed class WindowsPathname : IEquatable<WindowsPathname> /* TODO: more interfaces from System.String */
{
    public WindowsPathname(string path)
    {
        if (path == null) throw new ArgumentNullException(nameof(path));
        Value = path;
    }

    public string Value { get; }

    public override int GetHashCode()
    {
        return Value.ToUpperInvariant().GetHashCode();
    }

    public override string ToString()
    {
        return Value.ToString();
    }

    public override bool Equals(object obj)
    {
        var strObj = obj as string;
        if (strObj != null)
            return Equals(new WindowsPathname(strObj));

        var other = obj as WindowsPathname;
        if (other != null)
            return Equals(other);

        return false;
    }
    public bool Equals(WindowsPathname other)
    {
        // A LOT more needs to be done to make Windows pathanames equal.
        // This is just a specific example of the need for a case-insensitive string
        return Value.Equals(other.Value, StringComparison.OrdinalIgnoreCase);
    }
}

Yes, all/most of the interfaces on System.String should probably be implemented; but the above seems like enough for discussion purposes.

I can now write:

var p1 = new WindowsPathname(@"c:\foo.txt");
var p2 = new WindowsPathname(@"C:\FOO.TXT");
bool areEqual = p1.Equals(p2); // true  

This allows me to "talk about" WindowsPathnames in my code rather than a implementation detail like StringComparison.OrdinalIgnoreCase. (Yes, this specific class could also be extended to handle \ vs / so that c:/foo.txt would be equal to C:\FOO.TXT; but that's not the point of this question.) Furthermore, this class (with additional interfaces) will be case-insensitive when instances are added to collections; it would not necessary to specify an IEqualityComparer. Finally, a specific class like this also makes it easier to prevent "non-sense" operations such as comparing a file system path to a registry key.

The question is: will such approach be successful? Are there any serious and/or subtle flaws or other "gotchas"? (Again, having to do with trying to setup a case-insensitive string class, not managing Windows pathnames.)

Community
  • 1
  • 1
Ðаn
  • 10,934
  • 11
  • 59
  • 95
  • 2
    Be aware that on more '/' or '\' can make your windows path different string but equals window path. Check this path normalization to learn more http://stackoverflow.com/questions/2281531/how-can-i-compare-directory-paths-in-c – Perfect28 Oct 09 '15 at 13:20
  • 5
    I'm not sure what the problem is with using a string insensitive compare where necessary, since the only time you care about it is during a compare. It seems like using a sledgehammer to drive a picture nail, a lot of overhead for an easy to solve problem using existing framework code. – Ron Beyer Oct 09 '15 at 13:31
  • Its also easy to catch in a unit test. I'd also argue that its easy to forget to change all path strings to the case insensitive wrapper, especially when having multiple developers. Both of which can be caught in unit tests, only one of which adds additional overhead. – Ron Beyer Oct 09 '15 at 13:36
  • 6
    This is a big mistake, starting with the case insensitive comparison being heavily culture dependent, a property that can change, and ending with the exact path name comparison and parsing rules being a file system driver implementation detail. Basic reason why .NET doesn't have this. Only way to do it correctly is hit the disk for both and check that you get the same filesystem path back. – Hans Passant Oct 09 '15 at 13:43
  • 1
    Would also be simple to add an extension method for strings, like `public static bool InsensitveCompare(this string a, string b) { return string.Compare(a, b, true) == 0; }` But belaboring your point about this just being an example, where do you stop? `InsensitiveFirstName` `InsenstiveLastName`, `InsensitiveDog`, `InsensitiveCat` ad nauseam. Its just bloat for an ever-growing issue that is solved with a single line of code. – Ron Beyer Oct 09 '15 at 13:44
  • 4
    You could not have picked a worse possible example if you tried :) – Hans Passant Oct 09 '15 at 13:48
  • Why do you need it in first place? Common approach is to convert both strings [`ToLower`](https://msdn.microsoft.com/en-us/library/system.string.tolower.aspx) and then do comparison at the place. Unless you want to use `WindowsPathnames` as a key (e.g. in `Dictionary`), but then you can use `ToLower` for a given string every time before adding to a dictionary or searching. Do you have a scenario where `WindowsPathnames` is a must? – Sinatr Oct 09 '15 at 14:15
  • @Dan, ok, use [`ToUpper`](https://msdn.microsoft.com/en-us/library/system.string.toupper.aspx), it's irrelevant (`ToLower` probably doesn't works for some languages, while `ToUpper` does). You didn't answer why do you need this type and not `"some normal string".ToUpper()`. – Sinatr Oct 09 '15 at 14:22
  • *I can return the right string* - still unclear what you mean. You **do not** modify original string (which is saved as `string`), you just call `myString.ToUpper()` every time when **you need**. Actually, what are you going to do with such *caseless* string? You posted the code, but you forgot to add examples of usage and why this `WindowsPathname` is a must. I don't see a point in it, nor I really understand what `string-which-really-is-a-pathname` means, because you stated it's not a pathname in comments earlier. – Sinatr Oct 09 '15 at 14:36
  • @Dan: I still think this is a classic over-engineering approach to a fairly straightforward problem. – code4life Oct 09 '15 at 17:50
  • As an aside, your Equals method should *not* return `true` if it is passed a `string`. You've made your equality asymmetric. Bad idea. – Jon Skeet Oct 12 '15 at 07:38
  • Case insensitivity is a property of string comparers, not a property of a given string. What would you put in that class? Your sample actually illustrates that. It's not a CaseInsensitiveString class, but a utility class for a specific problem (path names). I think you give the answer yourself: the approach is ok for path names (sort of) but has no meaning in general. – Simon Mourier Oct 12 '15 at 13:53
  • I Never said the contrary. These are all valid examples of usage of string comparison methods, they do not demonstrate the need for a CaseInsensitiveString class. – Simon Mourier Oct 12 '15 at 14:17

5 Answers5

9

I would create an immutable struct that hold a string, converting the string in the constructor to a standard case (e.g. lowercase). Then you could also add the implicit operator to simplify the creation and override the compare operators. I think this is the simplest way to achieve the behaviour, plus you get only a small overhead (the conversion is only in the constructor).

Here's the code:

public struct CaseInsensitiveString
{
    private readonly string _s;

    public CaseInsensitiveString(string s)
    {
        _s = s.ToLowerInvariant();
    }

    public static implicit operator CaseInsensitiveString(string d)
    {
        return new CaseInsensitiveString(d);
    }

    public override bool Equals(object obj)
    {
        return obj is CaseInsensitiveString && this == (CaseInsensitiveString)obj;
    }

    public override int GetHashCode()
    {
        return _s.GetHashCode();
    }

    public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y)
    {
        return x._s == y._s;
    }

    public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y)
    {
        return !(x == y);
    }
}

Here is the usage:

CaseInsensitiveString a = "STRING";
CaseInsensitiveString b = "string";

// a == b --> true

This works for collections as well.

AleFranz
  • 771
  • 1
  • 7
  • 13
  • That's a good question. In this particular case is just a matter of personal preference, as it is immutable and it encapsulates a single field. What's matter is not only that System.String is a reference type but that is also immutable. For this reason, even if the field was public, it would be immutable. So it will be perfectly fine to use a class and the behavior would have been the same. – AleFranz Oct 13 '15 at 12:30
  • 3
    Besides immutability there's performance to consider and `struct` wins here too; it's a lightweight wrapper around a single reference/pointer and is cheap to copy around (probably as cheap as the reference itself), yet does not suffer from an additional level of indirection that you would need with a `class`. – andrewjsaid Oct 13 '15 at 23:24
3

So you want a something that converts a string to an object, and if you convert two strings to two of those objects, you want to be able to compare these objects for equality with your own set of rules about the equality of the two objects.

In your example it is about upper and lower case, but it could also be about forward slashes and backward slashes, maybe you even want to define that the "word" USD equals to $.

Suppose you divide the collection of all possible strings in subcollections of strings that you'd define to be equal. In that case "Hello" would be in the same subcollection as "HELLO" and "hElLO". Maybe "c:\temp" would be in the same collection as "c:/TEMP".

If you could find something to identify your subcollection, then you could say that all strings that belong to the same subcollection would have the same identifier. Or in other words: all strings that you defined equal would have the same identifier.

If that would be possible, then it would be enough to compare the subcollection identifier. If two strings have the same subcollection identifier, then they belong to the same subcollection and thus are considered equal according to our equality definition.

Let's call this identifier the normalized value of the string. The constructor of your CaseInsensitiveString could convert the input string into the normalized value of the string. To check two objects for equality all we have to do is check if they have the same normalized value.

An example of the normalization of a string would be:

  • Make the string lower case
  • make all slashes backward slashes
  • convert all words USD to $
  • remove all thousand separators in numbers without thousand seperator
  • etc, depending on when you want the strings to be equal.

According to the above the following Strings would all lead to the same normalized string:

  • White House $ 1,000,000
  • White House $ 1000000
  • white house USD 1000000

We can define anything as a normalized string, as long as all strings that we define equal have the same normalized string. A good example would be

  • white house $ 1000000

Note: I'm not going into detail about how to find words like USD and thousand separator. The importance is that you understand the meaning of normalized string.

Having said this, the only difficult part is to find the stringIdentifier. The rest of the class is fairly straightforward:

Code for the construction. The constructor takes a string and determines the subcollection it belongs to. I also added a default constructor.

public class CaseInsensitiveString : IEquatable<CaseInsensitiveString>
{
    private string normalized = "";

    public CaseInsensitiveString(string str)
    {
        this.normalized = Normalize(str);
    }

    public CaseInsensitiveString()
    {
         this.Normalize = Normalize(null);
    }
}

Equality: by definition, two objects are the same if they have the same normalized value

See MSDN How to Define Value Equality for a Type

public bool Equals (CaseInsensitiveString other)
{
    // false if other null
    if (other != null) return false;

    // optimization for same object
    if (object.ReferenceEquals(this, other) return true;

    // false if other a different type, for instance subclass
    if (this.Gettype() != other.Gettype()) return false;

    // if here: everything the same, compare the stringIdentifier
    return this.normalized==other.normalized;
}

Note that this last line is the only code where we do actual equality checking!

All other equality functions only use the Equals function defined above:

public override bool Equals(object other)
{
    return this.Equals(other as CaseInsensitiveString);
}

public override int GetHashCode()
{
    return this.Normalized.GetHashCode();
}

public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y)
{
    if (object.ReferenceEquals(x, null)
    {   // x is null, true if y also null
        return y==null;
    }
    else
    {   // x is not null
        return x.Equals(y);
    }
}

public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y)
{
    return !operator==(x, y);
}

So now you can do the following:

var x = new CaseInsensitiveString("White House $1,000,000");
var y = new CaseInsensitiveString("white house $1000000");
if (x == y)
    ...

Now the only thing we have to implement is the Normalize function. Once you know when two strings are considered equal you know how to normalize.

Suppose consider two strings equal if they are equal case insensitive and forward slashes are the same as backward slashes. (bad English)

If the normalize function returns the same string in lower case with all backward slashes, then two strings that we consider equal will have the same normalized value

private string Normalize(string str)
{
    return str.ToLower().Replace('/', '\');
}
Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
1

A shorter and more lightweight approach might be to create an extension method:

public static class StringExt
{
    public static bool IsSamePathAs(this string @this, string other)
    {
        if (@this == null)
            return other == null;

        if (object.ReferenceEquals(@this, other))
            return true;

        // add other checks
        return @this.Equals(other, StringComparison.OrdinalIgnoreCase);
    }
}

This requires far less coding than creating a whole separate class, has no performance overhead (might even get inlined), no additional allocations, and also expresses the intent pretty clearly IMO:

var arePathsEqual = @"c:\test.txt".IsSamePathAs(@"C:\TEST.txt");
Ðаn
  • 10,934
  • 11
  • 59
  • 95
vgru
  • 49,838
  • 16
  • 120
  • 201
  • @Dan: Could you explain that a bit? Not sure what you meant. Are you saying you have a personal "rule" against writing extension methods for strings? 1) Why wouldn't you use them on collections? You use them on `IEnumerable` without problems. So it's the indexer that bothers you? Did you want to say they shouldn't *mutate* the object, instead? Do you think you can mutate strings? 2) `String` is not even a collection. It has a (readonly) indexer, but it only implements `IEnumerable` and no other collection interfaces. So, when you write `"stuff".Skip(1)`, yes, you're using extensions. – vgru Oct 12 '15 at 08:38
0

Hmm... I don't think string case is the only challenge you have. Let me ask you a couple of questions:

Is c:\myPath the same as c:/myPath? How about file:////c:/myPath? Or how about \\myMachine\c$\myPath?

I kind of understand where you are headed and what you want accomplished, but it just seems like you're tunnel-visioned on a simple problem - why build a framework that does what a simple .ToLower() vs ToLower() comparison does?

That being said, if your problem scope, in addition to the string casing, involves trying to assess absolute equality of two given paths, it makes sense to write up a class. But that would require a lot more involved solution than what you are proposing...

Ðаn
  • 10,934
  • 11
  • 59
  • 95
code4life
  • 15,655
  • 7
  • 50
  • 82
0

First, Call a Spade a Spade
You must define what is the clear responsibility of the class.

Either you want the class to manage Windows Path names and then you can't discard all remarks regarding that because the "code managing case" will be merged with the "code managing paths". Coupling will make you then unable to test (and ensure proper behaviour) casing without path consideration.

Or you want to implement a CaseInvariantString and then you name it adequately (and maybe use it in another class named WindowsPathname).

For references about Class Responbility, Cohesion, Coupling and other great concepts, I would recommend the following books:

  • Clean Code from Robert C. Martin (Uncle Bob)
  • Code Complete from Steeve McConnell

Second, wrapping strings inside a class to check for case invariance can be viewed as wrapping integer in a PositiveInteger class. It can (and will) be considered by some as over-engineering. It is a common tendency from all developers to try to reach the pinnacle of object oriented dogma. Here it seems like the practice of wrapping all value types in class (like int into an ID class). However, don't forget to ask you questions.

  • What is the cost of adopting such practice?
  • What are the benefits?
  • What are the difficulties it may lead to?
  • Could I adopt the general approach for all my projects?
  • Do I have an enactment of my tech lead/architect (or similar authority) that it is a good practice?

Finally, as a simple technical point. You shouldn't create a string inside your class. It's deleterious for performance. In fact, because strings are invariant, when you do a ToUpperInvariant() in GetHashCode() it creates a new String.

And for the sake of path invariance... It doesn't work outside Windows
(For Mono, obviously "/foo" != "/Foo").

Ðаn
  • 10,934
  • 11
  • 59
  • 95
Fab
  • 14,327
  • 5
  • 49
  • 68