0

I would like to use str1 < str2 ? -1 : str1 > str2 for alphabetic ordering.

However, my input strings represent large integers (in hexadecimal format, for all it matters).

Is it safe to use the above comparison even when any of the input strings represent a numeric value larger than Number.MAX_SAFE_INTEGER, or will NodeJS cast those String objects to Number objects before comparing them (in which case, I'd be at risk of data-loss)?

halfer
  • 19,824
  • 17
  • 99
  • 186
goodvibration
  • 5,980
  • 4
  • 28
  • 61

2 Answers2

2

No, it's not safe to use and it doesn't even depend on Number.MAX_SAFE_INTEGER. The comparison will use lexicographical sorting:

console.log("10" < "2"); //true
console.log(10 < 2); //false

Instead, you can use String#localeCompare with the numeric option. This will use numeric collation on the strings and can handle numbers correctly:

console.log("10".localeCompare("2", undefined, {numeric: true})); // `1`: "10" is larger

const tooLargeA = "9999999999999999";
const tooLargeB = "10000000000000000";

console.log(tooLargeA);         // 9999999999999999
console.log(Number(tooLargeA)); // 10000000000000000

console.log(
  `${tooLargeA}  <  ${tooLargeB}:`, 
  Number(tooLargeA) < Number(tooLargeB)
); // false
console.log(
  `${tooLargeA} === ${tooLargeB}:`,  
  Number(tooLargeA) === Number(tooLargeB)
); // true
console.log(
  `${tooLargeA}  >  ${tooLargeB}:`,
  Number(tooLargeA) > Number(tooLargeB)
); // false

console.log(
  `"${tooLargeA}".localeCompare("${tooLargeB}", undefined, {numeric: true})`,
  tooLargeA.localeCompare(tooLargeB, undefined, {numeric: true})
); // `-1`: "9999999999999999" is smaller

If you're just looking for lexicographical sorting, then the size and type of the characters in the string is irrelevant: "The quick brown fox jumps over the lazy dog" can be compared to anything and it will be correct, so the string "1234567890123456789012345678901234567890123" is not in any way differentiated.

const tooLargeA = "9999999999999999";
const tooLargeB = "10000000000000000";

console.log(
  `"${tooLargeA}"  <  "${tooLargeB}":`, 
  tooLargeA < tooLargeB
); // false
console.log(
  `"${tooLargeA}" === "${tooLargeB}":`,  
  tooLargeA === tooLargeB
); // false
console.log(
  `"${tooLargeA}"  >  "${tooLargeB}":`,
  tooLargeA > tooLargeB
); // true
VLAZ
  • 26,331
  • 9
  • 49
  • 67
  • I'm looking to make alphabetic ordering ("10" indeed comes before "2"). So in my perspective, it is safe if and only if the input strings are **not** converted to numbers. I do not care about the corresponding number comparison (**unless** it eventually takes place "against my will"). – goodvibration Dec 29 '19 at 18:15
  • In that case, it doesn't matter what strings you compare. String comparison is string comparison and will just be using lexicographical sorting of characters. – VLAZ Dec 29 '19 at 18:17
  • That's exactly what I want to assert - that NodeJS doesn't "decide" to convert them to numbers just because it can. – goodvibration Dec 29 '19 at 18:20
  • @goodvibration I've added this to the answer. But no - it makes no sense to convert the string to a number before comparing. It results in very strange behaviour - what would you get when you do `"10" < "2"` - `true` (string comparison) or `false` (numeric sorting)? And what if you have `"10" < "a"` - would `"a"` be converted to `NaN` in which case the result would be `false` as would `"10" === "a"` as well as `"10" > "a"`? It makes no sense to do any conversion when you're comparing strings. Unless you opt into it with the numeric collation. – VLAZ Dec 29 '19 at 18:27
1

According to ecma-262

When using 'str' < 'str2'

7.2.13 Abstract Relational Comparison

If Type(px) is String and Type(py) is String, then

    If IsStringPrefix(py, px) is true, return false.
    If IsStringPrefix(px, py) is true, return true.
    Let k be the smallest nonnegative integer such that the code unit at index k within px is different from the code unit at index k within py. (There must be such a k, for neither String is a prefix of the other.)
    Let m be the integer that is the numeric value of the code unit at index k within px.
    Let n be the integer that is the numeric value of the code unit at index k within py.
    If m < n, return true. Otherwise, return false.

At no moment occurs an integer conversion (at least not in the way you mean)

Algorithm would be something like

if (str1.startsWith(str2)) return false
if (str2.startsWith(str1)) return true

for (let i = 0; i < str1.length; ++i) {
  if (str1[i] !== str2[i]) {
    k = i;
    break;
  }
}
return str1.codePointAt(k) < str2.codePointAt(k)

So you can use operator < with string on both side (whether they hold an "int" or not).

Now does it makes sense to get '10' < '100', likely not but depends on your usecase.

grodzi
  • 5,633
  • 1
  • 15
  • 15