If I am sorting two strings café and cafe is there a best practice to follow as to which letter comes first? I tested localeCompare in javascript and café comes before cafe but I don't understand why.
-
This may be relevant: http://stackoverflow.com/questions/6909126/javascript-sort-with-unicode – MatthewMartin Jul 03 '13 at 18:01
-
Seems to be the other way around for me -> http://jsfiddle.net/xvBWa/ – adeneo Jul 03 '13 at 18:02
2 Answers
Best practice is to sort without diacritics first, ie. cafe comes before café.
localeCompare
works by stripping the diacritics, so the sort order doesn't reflect the real words, since café is turned into cafe
You can read more about localeCompare here:

- 25,743
- 8
- 56
- 68
-
I don't think I understand what you are saying. The confusing part is that the sort order doesn't reflect the real words. – Chris.Stover Jul 03 '13 at 18:08
-
`localeCompare` strips the diacritics, ie. it turns café into cafe - thus when you sort using it, it is sorting cafe against cafe – Martin Jespersen Jul 03 '13 at 19:36
-
I tested this two different ways. The first test was ["café", "cafe"] the second was ["cafe", "café"]. The input order didn't matter both times café came out first. If it stripped the diacritics shouldn't the order be preserved? – Chris.Stover Jul 03 '13 at 22:26
-
The problem with localeCompare is that it is entirely implementation dependant, and you ahve no guarantee that 2 browsers will act the same with this method (tho mostly they do). I don't know what browser you amde your test in, but in the latest chrome `'cafe'.localeCompare('café') != 'café'.localeCompare('cafe')`, so yes the order should be preserved. – Martin Jespersen Jul 04 '13 at 17:59
https://en.wikipedia.org/wiki/Collation
How text is sorted depends on how it is done.
One tradition is the "US-ASCII" representation of characters, in the C programming language in particular. When text is sorted according to ASCII then the order depends solely on the numerical value of each character in the ASCII specification. Sometimes this is called the "C" locale.
Modern software should, usually, use a suitable locale so that the ordering occurs the way people expect it, regardless of the numeric representation of characters used by the computer.

- 12,037
- 3
- 33
- 51