35

Coming from this question I'm wondering why ä and ae are different(which makes sense) but ß and ss are treated as equal. I haven't found an answer on SO even if this question seems to be related and even mentions "that ß will compare equal to SS in Germany, or similar" but not why.

The only resource on MSDN I found was this: How to: Compare Strings

Here is mentioned following but also lacks the why:

// "They dance in the street." 
// Linguistically (in Windows), "ss" is equal to 
// the German essetz: 'ß' character in both en-US and de-DE cultures. 
.....

So why does this evaluate to true, both with de-DE culture or any other culture:

var ci = new CultureInfo("de-DE");
int result = ci.CompareInfo.Compare("strasse", "straße", CompareOptions.IgnoreNonSpace); // 0
bool equals = String.Equals("strasse", "straße", StringComparison.CurrentCulture); // true
equals = String.Equals("strasse", "straße", StringComparison.InvariantCulture);  // true
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 4
    I suppose it's due to http://en.wikipedia.org/wiki/German_orthography_reform_of_1996 – Fabrizio Calderan Apr 24 '15 at 11:08
  • 1
    @FabrizioCalderan That's unlikely. The orthographic reform has changed the rules *when* to use *ß* and when not, but that's it. Using *ß* is still required in certain contexts in orthographically correct German (as before), unless you use Swiss orthography (as before), and if, for some reason, you cannot use *ß*, *ss* is still considered as the default replacement (as before). – Uwe Apr 24 '15 at 19:44
  • I ran into the same .NET bug. Yeah right, it MUST be a bug. This strange behaviour makes general use of SortedList impossible! "Straße" and "Strasse" are not the same string. Period. Isn't there any workaround for this? – Tobias81 Jul 10 '15 at 12:06
  • @Tobias81: why makes it general use of a `SortedList` impossible in your case? – Tim Schmelter Jul 10 '15 at 12:09
  • SortedList.Add will throw an exception if I add 2 words which are considered equal (like "Busse+Buße" in the examples below). In my case it happens while reading file names from a file-system. – Tobias81 Jul 10 '15 at 13:25
  • @Tobias81: good example since those words an really mean completely different words. `Busse` can mean _buses_ (plural of bus) or _atonement_ in german while `Buße` only means _atonement_. – Tim Schmelter Jul 10 '15 at 13:29
  • A good pratical example when this problem will appear is, if someone renames stuff from the old german spelling (like "Fluß") to the new correct spelling ("Fluss"). A lot can go wrong in code if both are considered equal. – Tobias81 Jul 10 '15 at 13:33
  • @Tobias81: a workaround is to use `new SortedList(StringComparer.Ordinal);`. This is a simple byte comparison that is independent of language. – Tim Schmelter Jul 10 '15 at 13:34
  • @Tobias81: but your second example is not so good in my opinion. Consider that only one word is allowed and you can't have two `Fluß`. Isn't it good that the system doesn't allow `Fluss` because it is the same word(linguistically)? – Tim Schmelter Jul 10 '15 at 13:42
  • The renaming (in our case) also happens in a file system (where both words are allowed). The internal "cache" (implemented as SortedList) differed though. Thanks for the "StringComparer.Ordinal" trick. Will replace every usage soon. – Tobias81 Jul 13 '15 at 17:47
  • I'm german and I can tell that we use ss instead of ß if we cannot use ß for some reason. It's not the same but basically everyone knows that you can use ss for ß (but not the other way around). We use AE of Ä, OE for Ö and UE fur Ü the same way - in case the "Umlaute" are not available. This sadly does not explain why c# considers them as the same – BlueWizard Aug 08 '15 at 19:19

8 Answers8

29

If you look at the Ä page, you'll see that not always Ä is a replacement for Æ (or ae), and it is still used in various languages.

The letter ß instead:

While the letter "ß" has been used in other languages, it is now only used in German. However, it is not used in Switzerland, Liechtenstein or Namibia.[1] German speakers in Germany, Austria, Belgium,[2] Denmark,[3] Luxembourg[4] and South Tyrol, Italy[5] follow the standard rules for ß.

So the ß is used in a single language, with a single rule (ß == ss), while the Ä is used in multiple languages with multiple rules.

Note that, considering that case folding is:

Case folding is primarily used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation

The official Unicode 7.0 Case Folding Properties tells us that

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

where 00DF is ß and 0073 is s, so ß can be considered, for caseless comparison, as ss.

xanatos
  • 109,618
  • 12
  • 197
  • 280
3

Some background info for you. Taken from here.

Windows Alt Codes

In Windows, combinations of the ALT key plus a numeric code can be used to type a non-English character (accented letter or punctuation symbol) in any Windows application. More detailed instructions about typing accents with ALT keys are available. Additional options for entering accents in Windows are also listed in the Accents section of this Web site.

Note: The letters ü, ö, ä and ß can be replaced by "ue", "oe", "ae" or "ss" respectively.

German ALT Codes

Sym Windows ALT Code

Ä   ALT+0196
ä   ALT+0228
Ö   ALT+0214
ö   ALT+0246
Ü   ALT+0220
ü   ALT+0252
ß   ALT+0223
€   ALT+0128

Taken from here.

In the German alphabet, the letter ß, called "Eszett" (IPA: [ɛsˈtsɛt]) or "scharfes S", in English "sharp S", is a consonant that evolved as a ligature of "long s and z" (ſz) and "long s over round s" (ſs). When speaking it is pronounced [s] (see IPA). Since the German orthography reform of 1996, it is used only after long vowels and diphthongs, while ss is written after short vowels. The name eszett comes from the two letters S and Z as they are pronounced in German. It is also called scharfes S (IPA: [ˈʃaɐ̯.fəs ˈʔɛs, ˈʃaː.fəs ˈʔɛs] in German, meaning "sharp S". Its Unicode encoding is U+00DF.

Paul Zahra
  • 9,522
  • 8
  • 54
  • 76
3

Most of what i read here is true. But there are some misconceptions involved, so – as a German – let me put this straight:

ß/ẞ is a genuin german letter comming from a ligature of either ſs or ſz but never ss. That is long-s followed by either s or z.

A mid-syllable s in german is pronounced /z/ while a start and end-syllable s is pronounced /s/. As the letter z in german is always pronounced /ts/, it needed a way to distiguish those rarer cases, where that rule is broken by adding another letter and finally forming that ligature for those cases, where a mid-syllable sound /s/ was needed.

The sound /s/ never occures in genuin german words in the beginning and just in one foreign word, where it is (tada!) written with sz: Szene. So the need for a capital ß (ẞ) first arrised as capitalization of whole words came into use. ß and ss are not the same, historically ſz and ß are, that's why it is called an "eszett"! There are certain rules that allow ß to ss translation if ß is not available which is not true in modern evironments.

The right capitalization of Maße is MAẞE, and the right capitalization of Masse is MASSE. Both are different words in german.

So, in actual german, ss is /s/ shorting the vowel before and ß is /s/ after a long vowel. Assuming ss and ß being equal in any comparation is simply wrong because it might force words of completely different meaning being equal. Period.

rhavin
  • 1,512
  • 1
  • 12
  • 33
3

Just wait half a century.

This year, after over a century of dispute, German added officially the as a valid uppercase replacement for the lowercase version ß. It will take some time before people get used to the new uppercase form , but as soon as the capital version will dominate, there will be no reason to continue this evil

String.Equals("Mr. Meißner", "Mr. Meissner", StringComparison.CurrentCulture) == true;

hack.

Sebastian Wagner
  • 2,308
  • 2
  • 25
  • 32
  • 1
    And here is [the official ruleset](http://www.rechtschreibrat.com/DOX/rfdr_Regeln_2017.pdf) regarding this change. Imho the whole thing never made sence. `SS` and `SZ`, maybe. But `ss` and `sz` was simple wrong at least since the time I started school (97). – Christian Gollhardt Jul 14 '17 at 12:59
  • @ChristianGollhardt it seems the page has moved to here: https://web.archive.org/web/20170706162042/http://www.rechtschreibrat.com/DOX/rfdr_Regeln_2017.pdf# – jubilatious1 Apr 30 '23 at 04:30
2

A few background facts:

  • In Swiss German the eszet has been eliminated and replaced by ss in the 70s I think

  • For uppercase conversion the official German replacement rule has always been and still is eszet->SS, even though an uppercase eszet has been defined for unicode (U+1E9E) a few years ago. I have never seen it in anywhere in the wild yet!

  • No such changes and replacements have been made or have been necessary for the three umlaute äöü which have always had proper uppercase versions ÄÖÜ unless you don't have them. Replacing them by ae,oe,ue is only a workaround, though, hardly better than replacing a eszet by a beta or a 'B'..

So the different comparison results make at least some sense, although treatment, especially wrt sorting is not really reliably uniform in Germany between, say dictionaries or phone books, lists, indices etc..

TaW
  • 53,122
  • 8
  • 69
  • 111
  • "I have never seen it in anywhere in the wild yet!" > http://www.giessener-zeitung.de/global/start/ – rhavin Nov 07 '15 at 21:40
  • Ah, indeed, that is it. Only a logo/name but still.. Thanks for the link! – TaW Nov 07 '15 at 21:49
0

Because that is how Germans define their own language. Or perhaps most accurately: how those defining sorting/collation for German have defined how Germans define the German language.

In much the way that English defies that the upper case of i is I but other languages using the Latin alphabet (eg. Turkish) disagree.

Richard
  • 106,783
  • 21
  • 203
  • 265
  • That does not really answer why `ß` is treated like `ss` in ALL languages, whereas `ä` is NOT handled as `ae` – MakePeaceGreatAgain Apr 24 '15 at 11:12
  • 3
    @HimBromBeere Treatment of `ß` outside German is ultimately arbitrary because it is always a load from German (it does not exist in any non-German orthography). – Richard Apr 24 '15 at 11:15
0

Starting with .Net 5.0 these comparisons now returns -1/NotEqual. See https://learn.microsoft.com/en-us/dotnet/core/compatibility/globalization/5.0/icu-globalization-api for details

Alex Zhukovskiy
  • 9,565
  • 11
  • 75
  • 151
-1

In German the ß character (which exists in lower case only) sounds like ss. Its usage changes from time to time and many people confuse ß and ss. If we write a word like Fuß (foot) in all capital we'd write FUSS. If a keyboard or a font does not support ß we write ss and it is (nearly, mostly) correct.

This may explain why ß and ss are handeled as equivalent if it comes to sorting.

DrKoch
  • 9,556
  • 2
  • 34
  • 43
  • Does not explain why `new CultureInfo("de-DE").CompareInfo.Compare("strasse", "straße")` returns `0`. – Tim Schmelter Apr 24 '15 at 11:52
  • 1
    In fact it shouldn't. These *are* different words which should *sort* the same. If I read them loud they even sound different (That is why we have an `ß`) – DrKoch Apr 24 '15 at 12:25
  • Unfortunately not anymore since [1996](http://stackoverflow.com/a/29846109/284240). – Tim Schmelter Apr 24 '15 at 12:32
  • simply not true. it doe not only exist in lower case and the correct capitalization of Fuß is FUẞ – rhavin Aug 08 '15 at 22:51
  • In the real word, when a user enters an address, is strasse / straße interchangable? Should strasse / straße be stemmed in free text searching? – dunxz Nov 30 '22 at 08:05