3

Title says it all, what the maximum value that can be returned from 'some random string'.hash in Ruby?

The docs don't offer much insight.

Travis
  • 13,311
  • 4
  • 26
  • 40
  • I think this is dependent on the Ruby implementation. You should not need to know this under normal use of Ruby. Why do you need to know that? – sawa Mar 27 '15 at 14:43
  • possible duplicate of [Ruby max integer](http://stackoverflow.com/questions/535721/ruby-max-integer) – maerics Mar 27 '15 at 14:48

2 Answers2

2

The maximum size String#hash can output appears to be the maximum size of an unsigned long in your environment.

The String#hash function is implemented in rb_str_hash():

/* string.c, l. 2290 */

st_index_t
rb_str_hash(VALUE str)
{
    int e = ENCODING_GET(str);
    if (e && rb_enc_str_coderange(str) == ENC_CODERANGE_7BIT) {
        e = 0;
    }
    return rb_memhash((const void *)RSTRING_PTR(str), RSTRING_LEN(str)) ^ e;
}

st_index_t is defined as type st_data_t:

/* st.h, l. 48 */

typedef st_data_t st_index_t;

st_data_t is an unsigned long:

/* st.h, l. 20 */

typedef unsigned long st_data_t;

Since the hash is randomly generated (using SipHash), the entire range of values possible in an unsigned long should be available. In a 64-bit environment, unsigned long will be 64-bit, of course. SipHash's output is 64-bit, so in a 32-bit environment Ruby stores its output in an array with two 32-bit unsigned integers, and rb_memhash() combines them with a bitwise XOR.

in siphash.h:

/* siphash.h, l. 14 */

#ifndef HAVE_UINT64_T
typedef struct {
    uint32_t u32[2];
} sip_uint64_t;
#define uint64_t sip_uint64_t
#else
typedef uint64_t sip_uint64_t;
#endif

rb_memhash():

/* random.c, l. 1306 */

st_index_t
rb_memhash(const void *ptr, long len)
{
    sip_uint64_t h = sip_hash24(sipseed.key, ptr, len);
    #ifdef HAVE_UINT64_T
        return (st_index_t)h;
    #else
        return (st_index_t)(h.u32[0] ^ h.u32[1]);
    #endif
}

Here's Ruby's sip_hash24(), if you want to look at the implementation.

Zoë Sparks
  • 260
  • 3
  • 9
1

The Object#hash method returns a Fixnum, which:

Holds Integer values that can be represented in a native machine word (minus 1 bit).

Annoyingly, there doesn't appear to be an easy way to determine the exact max value on a particular system (there is an open feature request by Matz - #7517), so you must currently compute it yourself.

The sample code below (https://stackoverflow.com/a/736313/244128) works on some Ruby platforms but not reliably on all of them:

FIXNUM_MAX = (2**(0.size * 8 -2) -1)
FIXNUM_MIN = -(2**(0.size * 8 -2))
Community
  • 1
  • 1
maerics
  • 151,642
  • 46
  • 269
  • 291
  • 1
    [Feature #7517](https://bugs.ruby-lang.org/issues/7517). I still don't get why this isn't part of Ruby yet. Also your code isn't portable, since other Ruby implementations may have different max. and min. values. The original post also doesn't mention this. – cremno Mar 27 '15 at 15:36
  • @cremno: ya, I'm very surprised it's not built in. But doesn't the use of `0.size` (using [Fixnum#size](http://ruby-doc.org/core-2.2.1/Fixnum.html#method-i-size)) in the sample code make it portable between interpreters? – maerics Mar 27 '15 at 15:39
  • It isn't `Fixnum#size` that makes it non-portable. It's the `minus 1 bit` (in practice). See the comment by Charles Nutter (JRuby). – cremno Mar 27 '15 at 16:00