6

I have an ASP.Net Core Project with an MVC configuration. I use ASP.Net Core Version 5.0 My native language is german and therefore our database is also filled with german words, for example the word "fußball" (which means football or soccer, depending on where you are from).

As you can see, this word has an ß. In german, this "ß" is basically equivalent to "ss". Therefore if I have the string "fußball" I want to be able to find it if someone searches for "fussball" also.

I understand that ASP.Net Core has good localization and globalization options, I just can't seem to figure this one out.

Consider the following code:

var currCulture = CultureInfo.CurrentCulture.Name; // = "de-AT"

var str1 = "fußball";
str1.StartsWith("fuss"); //returns false
str1.StartsWith("fuss", StringComparison.InvariantCulture); //returns false
String.Equals("ß", "ss", StringComparison.InvariantCulture); //returns false

since I use my Windows-PC in an english language and I read in another Stackoverflow question that the CultureInfo is dependent on the operating system, I decided to insert the following into my Startup.cs-File, as suggested in this Stackoverflow question

var cultureInfo = new CultureInfo("de-AT"); //de-AT for Austria, i tried with de-DE too for germany, but the result was the same
cultureInfo.NumberFormat.CurrencySymbol = "€";

CultureInfo.DefaultThreadCurrentCulture = cultureInfo;
CultureInfo.DefaultThreadCurrentUICulture = cultureInfo;

unfortunately, with my current setup, it always tells me that "ß" and "ss" are not the same when comparing them in strings. The same goes for "ä" and "ae", but I need these to be found in the same way. Regardless of if the input is "ä"/"ae" or "ß"/"ss".

Any ideas what I've been doing wrong are greatly appreciated, I just can't seem to get this to work.

Thank you in advance & best regards!

Luckey
  • 75
  • 1
  • 6
  • Have you tried ```Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("de-AT");```? It seems to work for ß/ss, and pertaining to "ä"/"ae": https://stackoverflow.com/questions/29845872/why-is-ss-equal-to-the-german-sharp-s-character-%C3%9F – jh316 Nov 10 '21 at 14:24
  • Hi @jh316, thanx for the suggestion. I have tried now, in the `Startup.cs` under `Configure`, right under where I set the gerenal CultureInfo. Unfortunately it doesnt change anything. The comparison results stay the same – Luckey Nov 10 '21 at 14:27
  • Why not use StringComparison.CurrentCulture instead of using StringComparison.InvariantCulture in String.Equals? – jh316 Nov 10 '21 at 14:41
  • 2
    I think you actually want to use the current culture, rather than invariant culture. Then the code works as expected: https://dotnetfiddle.net/DUTPeA – oleksii Nov 10 '21 at 14:43
  • Mostly because I found mostly `StringComparison.InvariantCulture` in other posts. I have tried now once again with the option `StringComparison.CurrentCulture`, the result remains the same unfortunately – Luckey Nov 10 '21 at 14:45
  • 2
    @oleksii Not in .NET 5 .... .NET 5 uses a different unicode library. https://dotnetfiddle.net/QYDhXg – Lasse V. Karlsen Nov 10 '21 at 14:47
  • @LasseV.Karlsen you are right, I've compiled for .net 6rc2 and it fails... https://dotnetfiddle.net/gSLoCQ. Could you explain the diff please, if you have time? – oleksii Nov 10 '21 at 14:50
  • I only know that the Unicode library used on Windows was the code found in Win32 subsystem of Windows, and that in .NET 5 they took the breaking change to switch to ICU (International Components for Unicode) library, across the board and platforms. I don't believe anyone has the full list of changes this causes but there are plenty. – Lasse V. Karlsen Nov 10 '21 at 14:53
  • 1
    @oleksii https://learn.microsoft.com/en-us/dotnet/standard/base-types/string-comparison-net-5-plus – GSerg Nov 10 '21 at 14:54
  • That may lead to a whole new problem, because there are indeed words in German, that have different meanings when written with "ss" or "ẞ". For instance "Masse" vs. "Maße". These are indeed two different words. Furthermore it's not generally true that "ss == ẞ" From a historic point of view, the "ß" developed from a ligature of "sz" and not "ss". Take for instance the words "Fluss" (which was still spelled as "Fluß" a few years back) vs "Fuß". While in the first one the ß takes the position of a ss (because ss weren't allowed at the word ending) it takes the place of sz in the second one – derpirscher Nov 10 '21 at 22:00
  • This can also be derived from pronunciation. Ie, when the vocal is pronounced short like in Fluss, it's always "ss" nowadays. A few years back ss was only allowed between two vocals, and wherever it wasn't allowed a ß was used. You can see this for instance at (old spelling) Fluß and Flüsse (singular and plural of river/rivers) . But for instance at Fuß vs Füße (singular and plural of foot/feet) it stays ß be because the vocal is long. – derpirscher Nov 10 '21 at 22:08

2 Answers2

2

.NET Core on Windows had used the OS's built-in NLS localisation library, whereas other operating systems used the cross-platform and standard-compliant ICU library. Because NLS and ICU differ in implementation, this meant that the same .NET Core program could produce different results on different platforms when comparing strings.

To prevent this confusion, from .NET 5 the decision was made to use ICU for all platforms, including Windows. However, since many apps (yours included) are written on Windows and thus assume that string comparisons work in the NLS way, there are some things you need to do to make them work as expected with ICU.

In your case, you can explicitly set the CurrentCulture then ensure you explicitly use it in string comparisons:

using System.Globalization;

CultureInfo.CurrentCulture = new CultureInfo("de-AT", false);
Console.WriteLine($"CurrentCulture is {CultureInfo.CurrentCulture.Name}.");

string first = "Sie tanzen auf der Straße.";
string second = "Sie tanzen auf der Strasse.";

bool b = string.Equals(first, second, StringComparison.CurrentCulture);
Console.WriteLine($"The two strings {(b ? "are" : "are not")} equal.");

// CurrentCulture is de-AT.
// The two strings are equal.

By placing the following in your .csproj you can revert to NLS if ICU causes too many issues for your application, but this should only really be used while you are upgrading your code to work with ICU:

  <ItemGroup>
      <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />
  </ItemGroup>
chadnt
  • 1,095
  • 9
  • 24
1

As @chadnt's answer described NET5 introduced a breaking change that causes this problem.

Here I found another solution which works even with NET6/7.

string sb = "ß";
string ss = "ss";

StringComparer sc1 = StringComparer.Create(CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
Console.WriteLine(sc1.Compare(sb, ss));
//returns 0 - zero means sb a ss are equal; 
JanB
  • 248
  • 1
  • 11