2

I am looking for a deterministic way of sorting a list of strings.

Sorting a string of course often leads to the suggestion to use String.localeCompare. But the order must be deterministic, unrelated to the computer where it is running on.

The hardcore solution I came up with is hashing each string and compare these instead with a locale option en. Is there an easier solution?

The strings can be English, German, Chinese, Japanese, ...

HelloWorld
  • 2,392
  • 3
  • 31
  • 68

2 Answers2

1

Oddly, what fits your requirements is...the default sort:

theStrings.sort();

That sorts according to the UTF-16 code units in the strings, which doesn't vary by computer/locale/whatever. It treats the strings as (effectively) a series of 16-bit numbers (Unicode code units, to be precise).

From the specification:

If comparefn is not undefined, it should be a function that accepts two arguments x and y and returns a negative Number if x < y, a positive Number if x > y, or a zero otherwise.

And the < and > operators are defined by the abstract IsLessThan operation in the specification, which compares by the code units in the string.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    Lol! I thought it implies a `localeCompare`. On the other hand, I have actually a `Map` object where I need to sort the keys. Means I use the actual key `new Map(Array.from(myMap.entries()).sort((a, b) => ???))` – HelloWorld Aug 14 '21 at 18:11
  • 1
    @HelloWorld - It was perfectly reasonable (IMHO) to assume the default sort, which is defined in string terms, would use `localeCompare`. It doesn't, but it was a completely reasonable assumption. :-D – T.J. Crowder Aug 14 '21 at 18:13
  • 1
    Thanks! I think your post edit filled the last gap. I can rely on `<` and `>` – HelloWorld Aug 14 '21 at 18:14
1

Two solutions:

  • use a specific locale, not the current one like localeCompare. JS supports this through the Intl.Collator:

    arr.sort(new Intl.Collator('en').compare)
    

    Choose whatever language you need.

  • use the standard lexicographic comparison that the builtin </> operators for strings supply:

    arr.sort((a, b) => +(a>b)||-(b>a))
    arr.sort()
    
Bergi
  • 630,263
  • 148
  • 957
  • 1,375