I am working with some data in nodejs that I need to encode in a binary format. Internally I use nodejs Buffers for this, but when I serialize the data, which encoding is the best to use? I am currently using the 'binary' encoding but this is marked as deprecated in the documentation, is there a better choice? I am looking to use a little space as possible in my representation.
-
1Explain what you mean by serializing binary data. If you need to transmit the data in an text based protocol, what encoding is used in that protocol? Do you want to include the buffers in JSON? If you "serialize" to disk just `fs.writeFile` the `Buffer`. – windm Nov 03 '14 at 17:46
-
I am serializing to a redis database which can handle strings only though they are 'binary safe' – Max Ehrlich Nov 03 '14 at 17:54
-
1http://stackoverflow.com/questions/20732332/how-to-store-a-binary-object-in-redis-using-node – windm Nov 03 '14 at 18:19
-
Yeah I've read that. My question is asking what is the most space efficient encoding to use. Should I assume from your answer that it is base 64 – Max Ehrlich Nov 03 '14 at 18:34
1 Answers
In an effort to get a thorough answer to this I ran a few tests using my data. My data consists of a set of 4096 element number arrays. I used two set sizes, one with 100 arrays and the other with 5000 arrays. These were serialized to a redis cache as lists with each element of the redis list as a single serialized array. The size of the key redis was using for the list was then read off using debug object
and examining the serializedLength
property. Results are summarized in the tables below
100 samples
encoding size (bytes)
base64 4,177,241
binary 4,162,398
hex 4,669,965
JSON 2,271,670
utf16le* 4,543,605
utf8* 3,640,132
ascii* 2,929,850
5000 samples
encoding size (bytes)
base64 213,317,603
binary 213,433,150
hex 238,609,493
JSON 115,733,172
utf16le* 232,032,313
utf8* 185,279,730
ascii* 149,860,001
* text encodings were provided for completeness and should not be used on real data
Some things to note about these results:
- JSON encoding won in both tests and by a large margin, this seems odd to me since it expands the data adding brackets and quotes. I would love to know the reason for this.
- Memory consumption for each case should be
O(n*d)
wheren
is the number of elements andd
is the number of data samples. Memory consumption for the JSON case, however, should beO(c*d)
wherec
is the average number of digits in the numbers. binary
encoding beatsbase64
encoding on the 100 sample set but not the 5000 sample set- The text encodings (
utf16le
,utf8
,ascii
, all marked with a *) should not be used for real data and were included for completeness sake.utf8
actually crashed during deserialization andascii
is known to strip the high bit of any value [1] - The field used for these tests (
serializedLength
) may be a poor indicator of the actual size of a key [2]. However since all we care about here is the relationship between the size of the different encodings these results should still be useful.
Hopefully someone will find this information useful, I will be switching to JSON for my project. It seems a little weird but the numbers don't lie.

- 2,479
- 1
- 32
- 44