0

I'm looking to hash UUIDs to the unit interval, on a hot path for an application.

This sounds like something which should have already been efficiently solved, but my searches haven't yet turned up a solution. Does anyone know of (and have reference to a sample implementation) a quick/efficient hash of uuids to the unit interval (obviously while preserving as much of their random distribution as possible).

blueberryfields
  • 45,910
  • 28
  • 89
  • 168
  • By "unit interval" do you mean the range of floating point numbers from 0.0-1.0? – Mark Ransom Jul 13 '15 at 16:18
  • Create a 52-bit integer hash from the UUID. Or the value `0x3ff0000000000000` into it, then convert to double - this will give you a number in the range 1.0-2.0. Subtract 1.0 from that. See http://stackoverflow.com/questions/2905556/how-can-i-convert-a-byte-array-into-a-double-and-back for converting a byte array to a double. – Mark Ransom Jul 13 '15 at 18:59

2 Answers2

1

A standard UUID is just a 128-bit value. You could map that to the unit interval via a stereographic projection.

  1. Convert the UUID to a Quadruple-precision floating point number
  2. Map the floating point number to the unit sphere using a stereographic projection.
  3. Scale the point's polar angle on the unit sphere from [0, 2 π] to [0,1]

The projection equations are non-linear so you'll have to be careful with precision loss doing the transformation. I don't know what language you are implementing this but Boost has a library for doing high precision math.

dpmcmlxxvi
  • 1,292
  • 1
  • 9
  • 13
  • i'm on the jvm - it doesn't seem too difficult to implement a hash, just seems to me this should already exist somewhere – blueberryfields Jul 13 '15 at 17:16
  • @blueberryfields Possibly. I've never heard of it. To me a hash and a floating point number are very different things. One is for identifying and the other is for computing, but it's interesting that your combining the two. Are you looking to use the floating point number as an identifier like the hash or just for visualization purposes? If your looking for an ID why not stick with the UUID? – dpmcmlxxvi Jul 13 '15 at 17:28
  • i'm looking to do a bit of both - the uuids i'm looking at are not entirely random, and graphically displaying them can help with data analysis – blueberryfields Jul 13 '15 at 19:05
1

This method keeps "uniqueness level" of UUID by one-to-one bijection of each large number to corresponding float number on interval between [1/maxUUID,1].

After removing hyphens, UUID string is a simple hex string. Convert it to BigInt and divide by maximal possible UUID-number(128 bit ff...fff).

String hexUUID = UUIDstr.replaceAll('-','');
BigDecimal uuid = new BigDecimal(new BigInteger(hexUUID , 16));
BigDecimal maximal = new BigDecimal(new BigInteger("ffff...ff",16)); // compute it only once!!
BigDecimal floatID = uuid.divide(maximal, MathContext.DECIMAL128);

Probably, you have to play with characters case (to lowercase before converting to number), also play with big decimal divide parameters (scale, rounding mode) but the main idea is presented in code above.

blueberryfields
  • 45,910
  • 28
  • 89
  • 168