Taking the MD5 of a string as a 128-bit representation of an integer x
, how do I compute x % y
in Google Bigquery, where y
will typically be relatively small (approx 1000)?
Bigquery has an MD5 function, returning a result of type BYTES
with 16 bytes (i.e. 128 bits).
(Background: this is to compute deterministic pseudo random numbers. However, for legacy and compatibility reasons, I have no flexibility in the algorithm! Even though we know it has a (very slight) bias.)
This needs to be done millions/billions of times every day for different input strings and diffierent moduluses, so hopefully it can be done efficiently. As a fall back, I can compute it externally with another language, and then upload to Bigquery afterwards; but it would be great if I could do this directly in Bigquery.
I have studied a lot of number theory, so maybe we can use some mathematical tricks. However, I'm still stuck on more basic BiqQuery issues
- How do I convert a bytes array to some sort of "big integer" type?
- Can I access a subrange of the bytes from a
BYTES
array? - Given one byte (or maybe four bytes?), can I convert it to an integer type on which I can apply arithmetic operations?