13

*"Efficient" here basically means in terms of smaller size (to reduce the IO waiting time), and speedy retrieval/deserialization times. Storing times are not as important.

I have to store a couple of dozen arrays of integers, each with 1800 values in the range 0-50, in the browser's localStorage -- that is, as a string.

Obviously, the simplest method is to just JSON.stringify it, however, that adds a lot of unnecessary information, considering that the ranges of the data is well known. An average size for one of these arrays is then ~5500 bytes.

Here are some other methods I've tried (resultant size, and time to deserialize it 1000 times at the end)

  • zero-padding the numbers so each was 2 characters long, eg:

    [5, 27, 7, 38] ==> "05270738"
    
  • base 50 encoding it:

    [5, 11, 7, 38] ==> "5b7C"
    
  • just using the value as a character code (adding 32 to avoid the weird control characters at the start):

    [5, 11, 7, 38] ==> "%+'F" (String.fromCharCode(37), String.fromCharCode(43) ...)
    

Here are my results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
JSON.stringify    5286          60ms         99ms
zero-padded       3600         354ms        703ms
base 50           1800         315ms        400ms
charCodes         1800          21ms        178ms

My question is if there is an even better method I haven't yet considered?

Update
MДΓΓБДLL suggested using compression on the data. Combining this LZW implementation with the base 50 and charCode data. I also tested aroth's code (packing 4 integers into 3 bytes). I got these results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
LZW base 50       1103         494ms        999ms
LZW charCodes     1103         194ms        882ms
bitpacking        1350        2395ms        331ms
Community
  • 1
  • 1
nickf
  • 537,072
  • 198
  • 649
  • 721
  • 3
    Try combining base 50 or charCodes method with deflate/gzip. I'm not sure that there's even enough data for compression to be worth the overhead, but it can't hurt to check. – Matt Ball Apr 12 '12 at 00:16
  • @MДΓΓБДLL I just tried it with LZW compression... updating the results table now – nickf Apr 12 '12 at 00:21
  • 61% reduction, I wouldn't have expected that - not bad! – Matt Ball Apr 12 '12 at 00:41
  • 1
    A single JavaScript character takes up two bytes (per the [ecma specification](http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf).) Make sure to factor that in when you figure the size of the data. – gilly3 Apr 13 '12 at 07:55

3 Answers3

4

If your range is 0-50, then you can pack 4 numbers into 3 bytes (6 bits per number). This would allow you to store 1800 numbers using ~1350 bytes. This code should do it:

window._firstChar = 48;

window.decodeArray = function(encodedText) {
    var result = [];
    var temp = [];

    for (var index = 0; index < encodedText.length; index += 3) {
        //skipping bounds checking because the encoded text is assumed to be valid
        var firstChar = encodedText.charAt(index).charCodeAt() - _firstChar;
        var secondChar = encodedText.charAt(index + 1).charCodeAt() - _firstChar;
        var thirdChar = encodedText.charAt(index + 2).charCodeAt() - _firstChar;

        temp.push((firstChar >> 2) & 0x3F);    //6 bits, 'a'
        temp.push(((firstChar & 0x03) << 4) | ((secondChar >> 4) & 0xF));  //2 bits + 4 bits, 'b'
        temp.push(((secondChar & 0x0F) << 2) | ((thirdChar >> 6) & 0x3));  //4 bits + 2 bits, 'c'
        temp.push(thirdChar & 0x3F);  //6 bits, 'd'

    }

    //filter out 'padding' numbers, if present; this is an extremely inefficient way to do it
    for (var index = 0; index < temp.length; index++) {
        if(temp[index] != 63) {
            result.push(temp[index]);
        }            
    }

    return result;
};

window.encodeArray = function(array) {
    var encodedData = [];

    for (var index = 0; index < dataSet.length; index += 4) {
        var num1 = dataSet[index];
        var num2 = index + 1 < dataSet.length ? dataSet[index + 1] : 63;
        var num3 = index + 2 < dataSet.length ? dataSet[index + 2] : 63;
        var num4 = index + 3 < dataSet.length ? dataSet[index + 3] : 63;

        encodeSet(num1, num2, num3, num4, encodedData);
    }

    return encodedData;
};

window.encodeSet = function(a, b, c, d, outArray) {
    //we can encode 4 numbers in 3 bytes
    var firstChar = ((a & 0x3F) << 2) | ((b >> 4) & 0x03);   //6 bits for 'a', 2 from 'b'
    var secondChar = ((b & 0x0F) << 4) | ((c >> 2) & 0x0F);  //remaining 4 bits from 'b', 4 from 'c'
    var thirdChar = ((c & 0x03) << 6) | (d & 0x3F);          //remaining 2 bits from 'c', 6 bits for 'd'

    //add _firstChar so that all values map to a printable character
    outArray.push(String.fromCharCode(firstChar + _firstChar));
    outArray.push(String.fromCharCode(secondChar + _firstChar));
    outArray.push(String.fromCharCode(thirdChar + _firstChar));
};

Here's a quick example: http://jsfiddle.net/NWyBx/1

Note that storage size can likely be further reduced by applying gzip compression to the resulting string.

Alternately, if the ordering of your numbers is not significant, then you can simply do a bucket-sort using 51 buckets (assuming 0-50 includes both 0 and 50 as valid numbers) and store the counts for each bucket instead of the numbers themselves. That would likely give you better compression and efficiency than any other approach.

aroth
  • 54,026
  • 20
  • 135
  • 176
  • I had this same idea after I went to bed last night, but thankfully you saved me the headache of actually figuring out the implementation. I've added the benchmarking results to the table in the OP. Unfortunately, it seems that Javascript's bitwise operations aren't very optimised at all. – nickf Apr 12 '12 at 09:04
  • Also, the order is important, so the buckets idea is a no-go sadly. – nickf Apr 12 '12 at 09:04
  • *Update*: **Chrome's** bitwise operations aren't optimised. It seems Firefox does really well at it (in comparison at least). – nickf Apr 12 '12 at 09:15
  • A couple of things: First, you don't get 256 printable chars in a row until `U+0100`. The range from `U+0080` to `U+00ff` are control characters. So, offsetting by 48 doesn't give you only printable characters in your encoded string. Look at [your jsfiddle](http://jsfiddle.net/NWyBx/1) - The alert says "12 bytes", but I count only 11 characters. Second, A JavaScript string is stored as an Array of 16-bit integers. You'll have to pack 8 numbers into 6 bytes (3 characters). You will be unable to construct your encoded string using only printable characters unless you only use half the bits. – gilly3 Apr 12 '12 at 22:13
0

Assuming (as in your test) that compression takes more time than the size reduction saves you, your char encoding is the smallest you'll get without bitshifting. You're currently using one byte for each number, but if they're guaranteed to be small enough you could put two numbers in each byte. That would probably be an over-optimization, unless this is a very hot piece of your code.

jimw
  • 2,548
  • 15
  • 12
  • Numbers between 0-50 take up 6 bits. The best you can hope for is 1.33 numbers per byte. You could easily pack two numbers into a single UTF-16 code point though, as described [here](http://stackoverflow.com/a/7411549/26003). – Richard Poole Apr 12 '12 at 00:46
0

You might want to consider using Uint8Array or ArrayBuffer. This blogpost shows how it's done. Copying his logic, here's an example, assuming you have an existing Uint8Array named arr.

function arrayBufferToBinaryString(buffer, cb) {
    var blobBuilder = new BlobBuilder();
    blobBuilder.append(buffer);
    var blob = blobBuilder.getBlob();
    var reader = new FileReader();
    reader.onload = function (e) {
        cb(reader.result);
    };
    reader.readAsBinaryString(blob);
}
arrayBufferToBinaryString(arr.buffer, function(s) { 
  // do something with s
});
Yuval
  • 3,207
  • 32
  • 45