16

Is it possible to convert a string to ordinal upper or lower case. Similar like invariant.

string upperInvariant = "ß".ToUpperInvariant();
string lowerInvariant = "ß".ToLowerInvariant();
bool invariant = upperInvariant == lowerInvariant; // true

string upperOrdinal = "ß".ToUpperOrdinal(); // SS
string lowerOrdinal = "ß".ToLowerOrdinal(); // ss
bool ordinal = upperOrdinal == lowerOrdinal; // false

How to implement ToUpperOrdinal and ToLowerOrdinal?

Edit: How to to get the ordinal string representation? Likewise, how to get the invariant string representation? Maybe that's not possible as in the above case it might be ambiguous, at least for the ordinal representation.

Edit2:

string.Equals("ß", "ss", StringComparison.InvariantCultureIgnoreCase); // true

but

"ß".ToLowerInvariant() == "ss"; // false
fubo
  • 44,811
  • 17
  • 103
  • 137
Wouter
  • 2,540
  • 19
  • 31
  • 2
    @diiN__________ I don't think the idea of extension methods are what OP needs help with. They just don't know what the code for such a method should be. – Broots Waymb Jan 04 '17 at 14:42
  • I'm not asking about extension methods or stringcomparison. Only how to get the ordinal string representation. – Wouter Jan 04 '17 at 14:44
  • 9
    There is no ordinal string _respresentation_ because ordinal comparison means "compare each byte". – Tim Schmelter Jan 04 '17 at 14:46
  • 1
    @TimSchmelter so why StringComparison.OrdinalIgnoreCase exists, which bytes are case senstive? – Wouter Jan 04 '17 at 14:53
  • 3
    @Wouter because it converts them to uppercase first. From [the docs](https://msdn.microsoft.com/en-us/library/system.stringcomparer.ordinalignorecase(v=vs.110).aspx) *TheStringComparer returned by the OrdinalIgnoreCase property treats the characters in the strings to compare as if they were converted to uppercase using the conventions of the invariant culture* – Charles Mager Jan 04 '17 at 14:57
  • @wouter: look how the comparison is implemented(for ASCII): https://referencesource.microsoft.com/#mscorlib/system/string.cs,786821813d8a6340 (both strings will be uppercased) – Tim Schmelter Jan 04 '17 at 14:57
  • @TimSchmelter the referencesource. all ends up in external code... also see my second edit? What are the rules to uppercase them in ordinal comparison? (maybe ASCII first 127 bytes... but for unicode?) – Wouter Jan 04 '17 at 15:03
  • @Wouter: no, if it's ASCII no external code is used, but for non-ASCII `TextInfo.CompareOrdinalIgnoreCase(strA, strB)` is used which uses unmanaged code – Tim Schmelter Jan 04 '17 at 15:04
  • @TimSchmelter yes but that requires external IsASCII. Now if i have a unicode string how is it determined that it contains only ASCII values? – Wouter Jan 04 '17 at 15:09
  • @Wouter: [`IsAscii`](https://referencesource.microsoft.com/#mscorlib/system/string.cs,0c1a6eb865dfa7dd,references) is internal so you cannot use it, as commented it checks if _"the string only contains characters < 0x80"_. Look at [this](http://stackoverflow.com/a/14145356/284240) – Tim Schmelter Jan 04 '17 at 15:21
  • Please note that `string.Equals("ß", "ss", StringComparison.InvariantCultureIgnoreCase)` is now `false` in .NET 6. – Cheng Chen Jul 05 '22 at 06:04

2 Answers2

2

I don't believe this functionality exists in the .NET Framework or .NET Core. The closest thing is string.Normalize(), but it is missing the case fold option that you need to successfully pull this off.

This functionality exists in the ICU project (which is available in C/Java). The functionality you are after is the unorm2.h file in C or the Normalizer2 class in Java. Example usage in Java and related test.

There are 2 implementations of Normalizer2 that I am aware of that have been ported to C#:

  • icu-dotnet (a C# wrapper library for ICU4C)
  • ICU4N (a fully managed port of ICU4J)

Full Disclosure: I am a maintainer of ICU4N.

NightOwl888
  • 55,572
  • 24
  • 139
  • 212
  • Thanx, for this addition, from what i red on msdn Normalize doesn't change upper and lowercase only normalizes the many equivalent binary representations. I also found that unicode 00df and 1e9e are related. But somehow 1e9e is not the uppercase of 00df. See: http://www.fileformat.info/info/unicode/char/00df/index.htm and http://www.fileformat.info/info/unicode/char/1e9e/index.htm. – Wouter Oct 05 '17 at 15:02
  • Yes, that is why I mentioned it isn't quite up to par in my answer. To make it work like this, a call to the ICU unorm2.h is needed. It would be best to model the API after the Java Normalizer2 class and drop it into the icu.net project so it is available to everyone. – NightOwl888 Oct 05 '17 at 15:27
1

From msdn:

TheStringComparer returned by the OrdinalIgnoreCase property treats the characters in the strings to compare as if they were converted to uppercase using the conventions of the invariant culture, and then performs a simple byte comparison that is independent of language.

But I'm guessing doing that won't achieve what you want, since simply doing "ß".ToUpperInvariant() won't give you a string that is ordinally equivallent to "ss". There must be some magic in the String.Equals method that handles the speciall case of Why “ss” equals 'ß'.

If you're only worried about German text then this answer might help.

Community
  • 1
  • 1
James Crosswell
  • 648
  • 4
  • 13