I'm trying to address a hypothetical concern brought up by a coworker, regarding case-insensitive comparisons versus using toLowerCase
.
Wherever possible, we try to use equalsIgnoreCase
(or whatever the specific language / environment provides) for case-insensitive string comparisons. But in some cases, we have to compute a hash of the string and compare hashes. We use hash(s.toLowerCase(Locale.US))
to avoid failing the Turkey Test, but still there is some concern that there could exist a pair of strings for which
s1.equalsIgnoreCase(s2) != (s1.toLowerCase(Locale.US) == s2.toLowerCase(Locale.US))
i.e. the case-insensitive comparison says one thing, but the locale-specified lowercase comparison says something else.
If such a pair existed, a comparison of hashes generated by sha(s.toLowerCase(Locale.US))
could tell us that the two strings aren't equal when in fact they are.
Does a pair of strings (s1, s2) exist which satisfies the expression above?
Follow-up edit in lieu of answering my own question, since I accepted a linked answer that was provided in the comments.
One example is the pair of strings, ("ϑ", "ϴ")
(Theta, and "Small Theta"). Both are considered equal by equalsIgnoreCase
, but neither are modified by .toLowerCase(Locale.US)
This hits the edge-case shown here
import java.util.Locale
def lower(s: String) = s.toLowerCase(Locale.US)
def upper(s: String) = s.toUpperCase(Locale.US)
val s1 = "ϑ"
val s2 = "ϴ"
println(s"CI Equals: ${s1 equalsIgnoreCase s2}") // true
println(s"Lower Equals: ${lower(s1) == lower(s2)}") // false
println(s"UpperLower Equals: ${lower(upper(s1)) == lower(upper(s2))}") // true
To quote the linked answer,
a true case insensitive comparison [...] must check the lowercases of the uppercases.