0

For sharding I need good hash algorithm. I need to generate hash integer value for string within 0-2^31 range. Is this possible ?

Following answer give a way to generate integer from md5 algorithm.

Hashing a String to a Numeric Value in PostgresSQL

But is there a way to generate this value within a range ?

Community
  • 1
  • 1
Viraj
  • 5,083
  • 6
  • 35
  • 76
  • 1
    what if you apply modulus on the result ? – yurib Nov 12 '14 at 11:32
  • yeah, modulo operator `%` almost does, what you need - you just need to take account *negative remainders* http://stackoverflow.com/questions/22683600/real-mathematical-modulo-operation-in-postgres – pozs Nov 12 '14 at 11:33
  • People usually do this on the application side, because in general the application is doing data routing to shards. It's somewhat odd to need this at the database level. – Craig Ringer Nov 12 '14 at 11:36
  • Need to avoid modulo operator. This is to implement consistent-hashing. I'm trying to implement this as database functions so pgpool can use it. I don't have the luxury to implement it application layer ( such as java ). As design design I need to separate application from postgresql cluster information. – Viraj Nov 12 '14 at 11:39
  • @Viraj and why do you think, modulo isn't consistent? -- btw in your special case (your range is 0-2^31) you can use bitwise operators too, to extract the lower 31 bit. – pozs Nov 12 '14 at 11:51
  • @pozs when you say modulo, i thought `hash(key) mod n` where n is the number of servers. in that case node adding or removal causes re-hasing the data. I need to avoid that. However bitwise operators might be a good starting point to research. – Viraj Nov 12 '14 at 11:56

1 Answers1

1

In the function suggested in Hashing a String to a Numeric Value in PostgresSQL, the 32 bits range is expressed as the width of the bit(N) cast.

It cannot be passed as a parameter because a type modifier must be constant, but to get a fixed 31 bits, you may just adjust the function like this:

CREATE FUNCTION hash_string_into_31_bits(text) RETURNS int AS $$
   select ('x'||substr(md5($1),1,8))::bit(31)::int;
$$ language sql;

Then the result will be always 31 bits wide and positive, which is the same thing.

Community
  • 1
  • 1
Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156