10

I ran into this example where s1 < s2 and s2 < s3 but (s1 < s3) is false:

var str1 = "あいかぎ"
var str2 = "あいかくしつ"
var str3 = "あいがみ:"

print(str1 < str2)       // True
print(str2 < str3)       // True
print(str1 < str3)       // False (?)

Is this a bug or it is true that we cannot rely on string comparison is transitive (this breaks my sorting of string array)? I'm running Swift 3.

Update: all of these are False

print(str1 < str3)       // False (?)
print(str1 == str3)       // False (?)
print(str1 > str3)       // False (?)

So some strings are not comparable with each other?

Update: a comment in How does the Swift string more than operator work pointed out that the source code for < operator is in https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift, and the comparison is handled by _swift_stdlib_unicode_compare_utf8_utf8 in https://github.com/apple/swift/blob/master/stdlib/public/stubs/UnicodeNormalization.cpp

Update: These are true

print(str1 >= str3)  // True
print(str1 <= str3)  // True

Update: there is an issue with String.localizedCompare() too. There are two strings where s1 = s2 but s2 > s1:

str1 = "bảo toàn"
str2 = "bảo tồn"

print(str1.localizedCompare(str2) == .orderedSame) // true
print(str2.localizedCompare(str1) == .orderedDescending) // true
Pinch
  • 2,768
  • 3
  • 28
  • 44
  • 1
    Also, `print(str1 >= str3)` and `print(str1 <= str3)` both print true :) – Palle Sep 15 '17 at 02:05
  • 1
    I have a feeling that the answer is here: https://stackoverflow.com/a/25775112/341994 – matt Sep 15 '17 at 02:11
  • @LeoDabus No, the solution has something to do with Unicode normalization form D. – matt Sep 15 '17 at 02:17
  • 1
    But that isn't what he asked. He wants to _know_ something, not _do_ something. He isn't looking for a workaround, he's looking an _explanation_. And so am I. – matt Sep 15 '17 at 02:38
  • This article about unicode collation algorithm might be useful. http://www.unicode.org/reports/tr10/ – 0x384c0 Sep 15 '17 at 04:35
  • 1
    A slightly shorter example is `var str1 = "かぎ"; var str2 = "かく"; var str3 = "がみ"`. The Unicode scalars in normalization forms D are `U+304B U+304D U+3099`, `U+304B U+304F`, and `U+304B U+3099 U+307F`, respectively. So `str1 < str3` should be true, and I have no idea why it isn't. – Martin R Sep 15 '17 at 05:35
  • 1
    @LeoDabus There is a related issue with String.localizedCompare() too. I updated the question with an example. – Pinch Sep 16 '17 at 00:29

1 Answers1

2

It looks like this is not supposed to happen:

Q: Is transitive consistency maintained by the [Unicode Collation Algorithm]?

A: Yes, for any strings A, B, and C, if A < B and B < C, then A < C. However, implementers must be careful to produce implementations that accurately reproduce the results of the Unicode Collation Algorithm as they optimize their own algorithms. It is easy to perform careless optimizations — especially with Incremental Comparison algorithms — that fail this test. Other items to check are the proper distinction between the bases of accents. For example, the sequence <u-macron, u-diaeresis-macron> should compare as less than <u-macron-diaeresis, u-macron>; this is a secondary distinction, based on the weighting of the accents, which must be correctly associated with the primary weights of their respective base letters.

(Source: Unicode Collation FAQ)

In the UnicodeNormalization.cpp file, ucol_strcoll and ucol_strcollIter are called, which are part of the ICU project. This may be a bug in the Swift standard library or the ICU project. I reported this issue to the Swift Bug Tracker.

Community
  • 1
  • 1
Palle
  • 11,511
  • 2
  • 40
  • 61