3

The project I'm working on requires me to store javascript numbers(which are doubles) as BLOB primary keys in a database table(can't use the database native float data type). So basically I need to serialize numbers to a byte array in such a way that:

1 - The length of the byte array is 8(which is what is normally needed to serialize doubles)

2 - the byte arrays must preserve natural order so the database will transparently sort rows in the index b-tree.

A simple function that takes a number and returns an array of numbers representing the bytes is what I'm seeking. I prefer the function to be written in javascript but answers in java, C, C#, C++ or python will also be accepted.

Thiago Padilha
  • 4,590
  • 5
  • 44
  • 69

3 Answers3

5

To meet the sorting requirement, you need to:

  1. Use big-endian representation
  2. If the sign bit is 0, flip it to 1. Check the actual bit -- comparing the original number to 0 gives a different result in special cases such as the negative zero.
  3. If the sign bit is 1, flip all bits

Python 3 code:

import struct

def invert(x):
    return bytes(c ^ 255 for c in x)

def tobin(x):
    binx = struct.pack('>d', x)
    if binx > b'\x80': #negative
        return invert(binx)
    else:
        return struct.pack('>d', -x)

data = [float('-inf'), -100.0, -2.0, -.9, -.1, -0.0, 
    0.0, .1, .9, 2.0, 100.0, float('inf'), float('nan')]

print(sorted(data, key=tobin))
#[-inf, -100.0, -2.0, -0.9, -0.1, -0.0, 0.0, 0.1, 0.9, 2.0, 100.0, inf, nan]

On Python 2 change invert to:

def invert(x):
    return "".join(chr(ord(c) ^ 255) for c in x)

For reference, here is the equivalent node.js which already implements a Big Endian serialization function through the 'Buffer' class:

function serialize(n) {
  var buffer = new Buffer(8);
  var l = buffer.length;
  buffer.writeDoubleBE(n, 0);
  if (buffer[0] < 0x80) {
    buffer[0] ^= 0x80;
  } else {
    for (var i = 0; i < l; i++) 
      buffer[i] ^= 0xff;
  }
  return buffer
}

function deserialize(buffer) {
  var l = buffer.length;
  // 0x80 is the most significant byte of the representation of
  // the first positive number(Number.MIN_VALUE)
  if (buffer[0] >= 0x80) { 
    buffer[0] ^= 0x80;
  } else {
    for (var i = 0; i < l; i++) 
      buffer[i] ^= 0xff;
  }
  return buffer.readDoubleBE(0);
}
Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • Good answer, I have edited it to add the javascript version for future reference – Thiago Padilha Oct 17 '12 at 13:06
  • The javascript solution fails to serialize and deserialize both `NaN` and `-0`. It handles `+0` correctly. I don't see any good ordering for `NaN`, but I think correct serialization/deserialization is desired :) – jonasfj Mar 15 '15 at 06:43
  • @jonasfj Thanks. Does it look correct now? Also the Python version was sorting -0 in the wrong place. – Janne Karila Mar 16 '15 at 06:54
0

DataView provides an API to write doubles in 8 byte containers.

with methods getFloat64() and setFloat64()

The link mentioned apparently doesn't implement setFloat64(), but provides some perspective at least what's to be involved in writing such a function.

https://github.com/vjeux/jDataView

For browsers, the definitely best and fastest approach would be using Float64Array from typed arrays (and fixing possibly the byte order).

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
0

The obvious answer is to remove the restriction that you can't use the native database type. I can't see any point in it. It's still 8 bytes and it does the ordering for you without any further investigation, work, experiment, testing etc being necessary.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • How I wish I could easily change the requirements of all projects I work on :) – Thiago Padilha Oct 17 '12 at 11:43
  • There is actually many valid use cases for this. Such as keys in Azure Table Storage, AWS DynamoDB, AWS S3, and probably a ton of other NoSQL solutions. They often only have strings as keys, but have them lexicographically ordered... – jonasfj Feb 17 '15 at 08:28