4

Why do I need it: I have a task to handle Windows file names, particularly use them as keys. Their keys should be unique when and only when the corresponding files can coexist at Windows filesystem. I could convert it to upper or lower case.

This page says that ToUpperInvariant() should be used instead of ToLowerInvariant(), because:

A small group of characters, when they are converted to lowercase, cannot make a round trip.

Then, this answer provides examples of "ϱ", "ς", "ß", which may have this issue.

So the risk would be one of those:

  • there are 2 symbols a1 and a2, which would clash at filesystem. But ToLowerInvariant() keeps them unchanged, and therefore different.
  • there are 2 symbols A1 and A2, which would convert by ToLowerInvariant() to same lowercase symbol. They do not clash at filesystem.
  • anything else I missed?

And it is then assumed that ToUpperInvariant() should be somehow better than ToLowerInvariant(), so then it would produce different result which is correct.

I have tried the symbols from the linked answer, and actually all listed are not touched by To(Lower/Upper)Invariant(), even "ß" and "ẞ" are independent. I indeed can even create 2 files which differ only by those symbols, and they do not clash.

So, the question is: which are the actual examples when equivalence defined by ToLowerInvariant() is wrong (does not match Windows filesystem)?

max630
  • 8,762
  • 3
  • 30
  • 55
  • Wouldn't it be better to use the filesystem to determine whether the filesystem considers names to be equivalent? – Damien_The_Unbeliever Dec 11 '18 at 07:25
  • @Damien_The_Unbeliever no I don't think so. There are many reasons – max630 Dec 11 '18 at 07:57
  • 3
    You got this tiger by the wrong tail, using an invariant case conversion is exactly what you *don't* want to do. Because the file system does not use invariant casing rules. The casing table is written to an NTFS volume when it gets formatted, using the OS casing table that was in effect at the time the format was done. The only real hope for success you'd have is that the machine didn't move too far. An illustrative blog post [is here](http://archives.miloush.net/michkap/archive/2007/10/24/5641619.html). And beware that the .NET casing rules are subtly different from the OS rules, joy. – Hans Passant Dec 11 '18 at 08:16
  • @HansPassant So, for example, if I change OS settings to German, and maybe format a new drive, those files may clash? Wow. Thank you, I'll check it, and the pairs from the blog post – max630 Dec 11 '18 at 08:32
  • I'm not sure but it seems like the casing table does not depend on the settings, it only changes with Windows version. It's a bit of relief, and does not invalidate the question entirely. – max630 Dec 11 '18 at 09:51

1 Answers1

0

Not real answer to your question, just small nitpicking which is too long to put into comment.

A small group of characters, when they are converted to lowercase, cannot make a round trip.

This is certainly valid in Greek culture, but it doesn't apply when you are using invariant culture. In invariant culture letters "ϱ", "ς", "ß" are not uppercased at all (see example bellow).

    [TestMethod]
    public void GreekRho_ToUpper_ToLower_InvariantCulture()
    {
        var original = "ϱ";
        var upper = original.ToUpperInvariant();
        var lower = upper.ToLowerInvariant();
        Assert.AreEqual(original, lower);
    }

    [TestMethod]
    public void GreekRho_ToUpper_ToLower_GreekCulture()
    {
        var greek = CultureInfo.CreateSpecificCulture("el-GR");
        var original = "ϱ";
        var upper = original.ToUpper(greek);
        var lower = upper.ToLower(greek);
        Assert.AreNotEqual(original, lower);
    }
Jakub Šturc
  • 35,201
  • 25
  • 90
  • 110