2

I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers. This is based on something I read (and now can't find), which appears to be erroneous, which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this:

JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.

If we define the "size" of an element (number or string) as the length of its textual representation (so size = String(x).length for either a numeric element or a string element), then for a large array of elements of the same size (one numeric, and one strings), I was expecting the sorting of the strings to be equal to or slightly slower than sorting the arrays, but when I ran a simple test (code below), it turned out that the strings were about twice as fast to sort.

I want to know what it is about strings and numbers, and how javascript does its sorting, that makes string sorting faster than numeric sorting. Perhaps there is something I am misunderstanding.

Results:

~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474

Source:

"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";

function generateString(L) {
    const chars = [];
    while(chars.length < L) {
        chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
    }
    return chars.join("");
}

function generateNumber(L) {
    return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}

function generateList(generator, L, N) {
    const elements = [];
    while(elements.length < N) {
        elements.push(generator.call(null, L));
    }
    return elements;
}

function now() {
    return Date.now();
}

function getTime(baseTime) {
    return now() - baseTime;
}

function main(count, size) {
    console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
    const numbers = generateList(generateNumber, size, count);
    const numBaseTime = now();
    numbers.sort();
    const numTime = getTime(numBaseTime);

    console.log(`Sorting ${count} strings of length ${size}`);
    const strings = generateList(generateString, size, count);
    const strBaseTime = now();
    strings.sort();
    const strTime = getTime(strBaseTime);

    console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}

main(process.argv[2], process.argv[3]);
GTF
  • 8,031
  • 5
  • 36
  • 59
  • 1
    Strings - No need to find the weight. Numbers - You need to find the weight and based on that you need to sort. Like considering `"500", "1000"` vs. `500, 1000`. Former is swapped while latter is right. This is just my understanding of why it might be this way. It is may or may not be correct. – Praveen Kumar Purushothaman Jun 29 '17 at 10:57
  • 8
    The obvious problem is [that you are sorting your numbers as strings](https://stackoverflow.com/q/1063007/1048572). – Bergi Jun 29 '17 at 11:01
  • Notice that floating point numbers in a magnitude of `10^32` will be using scientific notation anyway for stringification. There's only 64 bits of entropy, you can't get an arbitrarily long string representation. – Bergi Jun 29 '17 at 11:08
  • @Bergi is right, if you don't provide a callback to the `sort` function then it proceeds with the default sorting which coerces the values to be compared into strings (if they are not already a string type). – Redu Jun 29 '17 at 11:59
  • Ah ok, thanks @Bergi. I'll re-run my test and see if that changes things significantly. – GTF Jun 29 '17 at 15:40
  • As predicted, @Bergi, that completely changed the numbers (no pun intended). Numbers now sort 3x faster than strings. If you want to add your comment as an actual answer I'll mark it as the correct one. – GTF Jun 29 '17 at 15:49
  • @PraveenKumar I'm not sure what "weight" you're talking about, but none of that seems to make sense to me. – user229044 Jul 02 '17 at 21:14
  • @meagar Think in two ways... When it comes to Strings, `"500"` will come after `"1000"` because `1` comes before `5` in case of strings. Whereas, as weight, I am talking about their values. Value-wise `1000` comes after `500`. Sorry, I don't know how to explain this better. Does it make sense now? – Praveen Kumar Purushothaman Jul 02 '17 at 22:35
  • @meagar Actually, Bergi explained in a better way of what I said. LoL. – Praveen Kumar Purushothaman Jul 02 '17 at 22:35
  • @PraveenKumar No, that makes no sense at all. 1000 comes after 500 and that's a single `<` comparison, exactly like sorting `"1"` before `"5"`. Numbers are much cheaper to sort than strings, there is no "weight" involved. The problem here had nothing to do with "finding weight". – user229044 Jul 03 '17 at 12:50
  • @meagar Eh, yeah, I meant the same thing but I guess I confused and swapped numbers and strings... Now I am totally confused! `:(` – Praveen Kumar Purushothaman Jul 03 '17 at 13:54

1 Answers1

6

I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers.

That's true, string comparison is way more costly than number comparison.

This is based on something I read which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this.

Yes, what you read seems to be erroneous indeed. Strings are just sequences of characters (each character being a 16 bit value), so they're usually stored as arrays of integers, or rather pointers to them. Your array of strings could indeed be treated as an array of an arrays though.

When I ran a simple test, it turned out that the strings were about twice as fast to sort.

The problem with your code is that you are sorting your numbers as strings, which casts every number to a string and then compares that. See How to sort an array of integers correctly. When you fix that, notice that the call to the comparison function still has quite a bit of overhead to the builtin string comparison, so if you really benchmarked relational operators (<, ==, >) on the different types I'd expect numbers to perform even better.

GTF
  • 8,031
  • 5
  • 36
  • 59
Bergi
  • 630,263
  • 148
  • 957
  • 1,375