String comparison (>) returns different results on different platforms?

Question

Consider the following predicate

print("S" > "g")

Running this on Xcode yields false, whereas running this on the online compiler of tutorialspoint or e.g. the IBM Swift Sandbox (Swift Dev. 4.0 (Sep 5, 2017) / Platform: Linux (x86_64)), yields true.

How come there's a different result of the predicate on the online compilers (Linux?) as compared to vs Xcode?

I tried playing with the code and realized that there is a difference between capitalized letters. When trying "S" and "G" instead of "g" it gave me the correct result. So i'm guessing it has something to do with this. — FA95, Nov 18 '17 at 11:51
If performing lexicographical comparison of single ASCII characters (such as `"S"` or `"G"`), we would expect the corresponding ASCII value to be used in the comparison (or its unicode scalar value: for ASCII characters the relative ordering ASCII -> (hex)unicode scalar will be preserved). If so, naturally `"S"` (ASCII 83) will be ordered prior to `"g"` (ASCII 103), whereas `"G"` (ASCII 71) will be ordered prior to `"S"`. This is also the case (as mentioned in my answer) for the Apple platforms, whereas Linux platforms will order `"S"` vs `"g"` according to UCA, and order `"g"` prior to `"S"`. — dfrib, Nov 18 '17 at 11:54

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

This is a known open "bug" (or perhaps rather a known limitation):

SR-530 - [String] sort order varies on Darwin vs. Linux

Quoting Dave Abrahams' comment to the open bug report:

This will mostly be fixed by the new string work, wherein String's default sort order will be implemented as a lexicographical ordering of FCC-normalized UTF16 code units.

Note that on both platforms we rely on ICU for normalization services, and normalization differences among different implementations of ICU are a real possibility, so there will never be a guarantee that two arbitrary strings sort the same on both platforms.

However, for Latin-1 strings such as those in the example, the new work will fix the problem.

Moreover, from The String Manifest:

Comparing and Hashing Strings

...

Following this scheme everywhere would also allow us to make sorting behavior consistent across platforms. Currently, we sort String according to the UCA, except that--only on Apple platforms--pairs of ASCII characters are ordered by unicode scalar value.

Most likely, the particular example of the OP (covering solely ASCII characters), comparison according to UCA (Unicode Collation Algorithm) is used for Linux platforms, whereas on Apple platforms, the sorting of these single ASCII character String's (or; String instances starting with ASCII characters) is according to unicode scalar value.

// ASCII value
print("S".unicodeScalars.first!.value) // 83
print("g".unicodeScalars.first!.value) // 103

// Unicode scalar value
print(String(format: "%04X", "S".unicodeScalars.first!.value)) // 0053
print(String(format: "%04X", "g".unicodeScalars.first!.value)) // 0067

print("S" < "g") // 'true' on Apple platforms (comparison by unicode scalar value),
                 // 'false' on Linux platforms (comparison according to UCA)

See also the excellent accepted answer to the following Q&A:

What does it mean that string and character comparisons in Swift are not locale-sensitive?

Related: https://stackoverflow.com/questions/43921538/swift-how-to-sort-dict-keys-by-byte-value-and-not-alphabetically — Martin R, Nov 18 '17 at 13:01
@MartinR thanks. Should we possibly close this thread as a duplicate of that one? — dfrib, Nov 18 '17 at 15:04
@FA95 The question itself isn't a duplicate (and was entirely motivated to being asked), but the answer in the linked thread provides sufficient info (similar to my own answer) to answer it. For such cases, it is even encouraged to duplicate mark threads, as they will then be connected; one visiting the other thread will find this one and vice versa. — dfrib, Nov 18 '17 at 19:26

String comparison (>) returns different results on different platforms?

1 Answers1

Comparing and Hashing Strings