4

I'm trying to address a hypothetical concern brought up by a coworker, regarding case-insensitive comparisons versus using toLowerCase.

Wherever possible, we try to use equalsIgnoreCase (or whatever the specific language / environment provides) for case-insensitive string comparisons. But in some cases, we have to compute a hash of the string and compare hashes. We use hash(s.toLowerCase(Locale.US)) to avoid failing the Turkey Test, but still there is some concern that there could exist a pair of strings for which

s1.equalsIgnoreCase(s2) != (s1.toLowerCase(Locale.US) == s2.toLowerCase(Locale.US))

i.e. the case-insensitive comparison says one thing, but the locale-specified lowercase comparison says something else.

If such a pair existed, a comparison of hashes generated by sha(s.toLowerCase(Locale.US)) could tell us that the two strings aren't equal when in fact they are.

Does a pair of strings (s1, s2) exist which satisfies the expression above?


Follow-up edit in lieu of answering my own question, since I accepted a linked answer that was provided in the comments.

One example is the pair of strings, ("ϑ", "ϴ") (Theta, and "Small Theta"). Both are considered equal by equalsIgnoreCase, but neither are modified by .toLowerCase(Locale.US)

This hits the edge-case shown here

import java.util.Locale
def lower(s: String) = s.toLowerCase(Locale.US)
def upper(s: String) = s.toUpperCase(Locale.US)

val s1 = "ϑ"
val s2 = "ϴ"
println(s"CI Equals: ${s1 equalsIgnoreCase s2}") // true
println(s"Lower Equals: ${lower(s1) == lower(s2)}") // false
println(s"UpperLower Equals: ${lower(upper(s1)) == lower(upper(s2))}") // true

To quote the linked answer,

a true case insensitive comparison [...] must check the lowercases of the uppercases.

Dylan
  • 13,645
  • 3
  • 40
  • 67
  • 2
    This question shouldn't be "language agnostic" because the implementation of `equalsIgnoreCase` and `toLowerCase` depends on the language. – Dialecticus Jun 28 '22 at 14:49
  • @Dialecticus For the purposes of this question, assume the two functions are "correct" to the greatest possible extent. E.g. `equalsIgnoreCase` is a true case-insensitive equality check, and `toLowerCase(US)` performs a lowercase conversion the way an American would. I expressed them in terms of the language I happen to be using, but my intent is to find a programming-language-agnostic answer. – Dylan Jun 28 '22 at 15:07
  • That said, I'm curious now as to how different programming languages would provide different behavior for a locale-specified `toLowerCase` function. Any examples? – Dylan Jun 28 '22 at 15:08
  • 3
    I understood that you have an actual project written in an actual language, and a possible problem in your actual project. It is counterproductive to look for hypothetical language-agnostic solutions if the problem is constrained to actual language. Why do you insist on language agnostic, if you project is written in java? Makes no sense. – Dialecticus Jun 28 '22 at 15:18
  • 1
    Java users may answer with "ah yes, I know about this problem, and here's how I have solved it". But Java users may not find your question, because it is not tagged with java tag. – Dialecticus Jun 28 '22 at 15:20
  • Did you mean `equals` instead of the second `==` in the title? – Mad Physicist Jun 28 '22 at 18:18
  • That depends on how `equalsIgnoreCase` is implemented, no? – Mad Physicist Jun 28 '22 at 18:19
  • @MadPhysicist For a Java reader, yes. I'm using Scala so `==` and `equals` are the same thing, but someone edited the tags to remove the scala tag – Dylan Jun 28 '22 at 18:19
  • 1
    Relevant: https://stackoverflow.com/q/60196641/2988730 – Mad Physicist Jun 28 '22 at 18:22
  • @MadPhysicist I accepted your linked question as a duplicate since it gave me enough information to answer my question; I've edited my question to include that info in lieu of adding an answer – Dylan Jun 28 '22 at 19:16
  • ...maybe I should've just added an answer, since that apparently was a confusing thing to do. – Dylan Jun 28 '22 at 19:17

0 Answers0