9

I am wondering why does Hashtable avoid using negative hashcode ?

int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;

Where (hash & 0x7FFFFFFF) makes the signed bit to be 0 to positive, but why couldn't we treat the signed 32 bit integer as unsigned ? or even use the modular tricks to make it become positive. For example,

public static long int_mod(int hashcode, int tab_length){
     return (hashcode % tab_length + tab_length) % tab_length;  
} 
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
peter
  • 8,333
  • 17
  • 71
  • 94
  • I think this method is simple and work. And probably that is why it was used. `(hash & 0x7FFFFFFF)` narrow to positive, `% tab.length` narrow to tab size. Simple clean and easy. – Damian Leszczyński - Vash Sep 24 '12 at 14:05
  • which method are you referring to ? the original implementation ? – peter Sep 24 '12 at 14:08
  • Yes. The already implemented. – Damian Leszczyński - Vash Sep 24 '12 at 14:42
  • 1
    Integer division and modulus are by far the slowest operations (maybe 40 cycles on contemporary Intel/AMD CPUs), while `&` belong to the set of cheapest operations (1 cycle, can execute in parallel). So you solution would take about twice as much time as the original. – maaartinus Sep 26 '12 at 20:05

6 Answers6

12

The value has to be between 0 and tab.length - 1 because it is used as an index into an internal array (tab in this case) storing the values (and overflow elements). Therefore, it cannot be negative.

I assume (hash & 0x7FFFFFFF) % tab.length is used in preference of (hashcode % tab.length + tab.length) % tab.length because it is faster without unduly increasing the chance of collisions, but you would have to find a design document or talk to the original developers to know for sure.

verdesmarald
  • 11,646
  • 2
  • 44
  • 60
2

... but why couldn't we ...

You're asking why a particular implementation was chosen. Nobody can tell you that, except maybe the original author of the code, if he or she remembers.

There are always multiple ways to implement an idea in code. The person that's writing the code has to choose one of them. It doesn't make a lot of sense to ask, after the fact, why another particular implementation wasn't chosen.

Jesper
  • 202,709
  • 46
  • 318
  • 350
  • I haven't checked, but I guess so. Why are you unhappy with how it's implemented as it is? – Jesper Sep 24 '12 at 14:03
  • 1
    @Jesper: IMHO it does make a lot of sense, so we can learn from the decision. Of course, quite often nobody can tell for sure, but arguments can be found and evaluated. This makes the question to a sort of discussion, which is not welcome here on SO, but it's very useful. – maaartinus Sep 26 '12 at 20:00
  • What a weird thing to say. Code can only be explained by author and nobody else? – Jeffrey Blattman Mar 29 '22 at 21:36
  • @JeffreyBlattman Imagine that your neighbour paints their house blue, and now you're going to ask random other people why your neighbour didn't paint their house red. Nobody can answer that... If you really want to know, then the one person you should ask is your neighbour himself. – Jesper Mar 30 '22 at 07:00
  • @Jesper if your coding paradigms are as arbitrary as selecting a color for house, I'd say that's not engineering, that's art. Okay, if you're coding for youself, have at it. If you ever want other people to understand your code, you ought to be thinking about making your reasons understandable through some combination of using well understood design patterns, clear coding, comments, and additional documentation. – Jeffrey Blattman Mar 30 '22 at 22:50
2

If you keep your capacity a power of 2,

private static final int CAPACITY = 64;
private static final int HASH_MASK = CAPACITY - 1;

final int index = obj.hashCode() & HASH_MASK;

Basically, mask out all but the lower bits in which you are interested. Assuming the lower N bits has as even of a distribution as the entire hash code.

Jeffrey Blattman
  • 22,176
  • 9
  • 79
  • 134
1

Java has no native unsigned type. If the hashCode would have negative values then we will have to apply such masking tricks everywhere we use hashCode as an index into array.

Serge
  • 6,088
  • 17
  • 27
1

There is an ostensibly great reason why we can’t treat the signed int as unsigned: the original Java developers thought that unsigned support was an unnecessary complication, as unsigned arithmetic can be confusing. This hasn’t been a big enough issue for Java to address since.

As verdesmerald mentioned, since there is no clear record of why (hash & 0x7FFFFFFF) % tab.length was chosen over something to the effect of your clever modding, although we can find justifications for the decision, ultimately we can only speculate about why it was made.

One final point of semantics, which probably isn’t that important: It’s not so much that Hashtable isn’t using negative hashcodes as it is that the hashcodes are being “translated” into a nonnegative form for the index.

Community
  • 1
  • 1
Gregory Gan
  • 105
  • 11
0

Nobody can tell you about why the orginal author chose that implementation, except himself (and maybe his colleagues). It doesn't really matter anyhow because it works fine.

About what your proposed implementation: It probably doesn't do what you think it should be doing. You should refresh what the % operator in java really does: For example here. Add integer overflow into the mix and your proposed expression can result in negative values...

Community
  • 1
  • 1
Durandal
  • 19,919
  • 4
  • 36
  • 70