According to Wikipedia, in 2017 using an uppercase ẞ
(Unicode U+1E9E
) was officially adopted--at least as an option--for what may in fact be a subset of fully-capitalized words in German:
In June of that year, the Council for German Orthography officially adopted a rule that ⟨ẞ⟩ would be an option for capitalizing ⟨ß⟩ besides the previous capitalization as ⟨SS⟩ (i.e., variants STRASSE and STRAẞE would be accepted as equally valid).2
It seems like this addition to the German language would greatly simplify case-comparisons between strings (so-called "case-folding" or "fold-case" comparisons). Note, I started this inquiry trying to understand Raku's (a.k.a. Perl6's) implementation, but the question in fact seems to generalize to other programming languages. Here is Raku's default implementation--starting with 13 words from rfdr_Regeln_2017.pdf that have been lowercased (via Raku's .lc
function):
~$ cat TO_ẞ_OR_NOT_TO_ẞ.txt
maß straße grieß spieß groß grüßen außen außer draußen strauß beißen fleiß heißen
~$ raku -ne '.words>>.match(/^ <:Ll>+ $/).say;' TO_ẞ_OR_NOT_TO_ẞ.txt
(「maß」 「straße」 「grieß」 「spieß」 「groß」 「grüßen」 「außen」 「außer」 「draußen」 「strauß」 「beißen」 「fleiß」 「heißen」)
~$ raku -ne '.uc.say;' TO_ẞ_OR_NOT_TO_ẞ.txt
MASS STRASSE GRIESS SPIESS GROSS GRÜSSEN AUSSEN AUSSER DRAUSSEN STRAUSS BEISSEN FLEISS HEISSEN
~$ raku -ne '.fc.say;' TO_ẞ_OR_NOT_TO_ẞ.txt
mass strasse griess spiess gross grüssen aussen ausser draussen strauss beissen fleiss heissen
I'm suprised that Raku's fc
fold-case implementation essentially converts to lowercase ss
. It's no surprise then that trying to search for eq
string equality between the upper/lower "round-tripped" words and the original are all False
:
~$ raku -ne 'for .words {print $_.uc.lc eq $_.lc }; "".put;' TO_ẞ_OR_NOT_TO_ẞ.txt
FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
Fold-cased (.fc
) words match, but they do so on the basis of ss
characters, not ß
:
~$ raku -ne 'for .words {print $_.uc.lc eq $_.fc }; "".put;' TO_ẞ_OR_NOT_TO_ẞ.txt
TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue
Starting from a capital-ẞ, taking just one capitalized/uppercase word again demonstrates the dichotomy:
~$ echo "straße STRASSE STRAẞE" | raku -ne ' .put for .words;'
straße
STRASSE
STRAẞE
~$ echo "straße STRASSE STRAẞE" | raku -ne ' .lc.say for .words;'
straße
strasse
straße
~$ echo "straße STRASSE STRAẞE" | raku -ne ' for .words { say $_.lc eq "straße" };'
True
False
True
~$ echo "straße STRASSE STRAẞE" | raku -ne ' for .words { say $_.lc eq $_.fc };'
False
True
False
Have any programming languages instituted a foldcase
conversion between lowercase ß
<--> uppercase ẞ
, by default? What programming languages have added lowercase ß
<--> uppercase ẞ
conversion, as an option (or via a library)? Many Questions/Answers on StackOverflow pre-date the 2017 decision, so I'm looking for up-to-date answers.
[ADDENDUM: I note via this FAQ that the Unicode Consortium's rules appear to be at odds with the 2017 decision of the Council for German Orthography].