13

I have a program for my Java class where I want to use hashSets to compare a directory of text documents. Essentially, my plan is to create a hashSet of strings for each paper, and then add two of the papers hashSets together into one hashSet and find the number of same 6-word sequences.

My question is, do I have to manually check for, and handle, collisions, or does Java do that for me?

marcinx27
  • 243
  • 1
  • 3
  • 15

2 Answers2

10

Java Hash Maps/Sets Automatically handle Hash collisions, this is why it is important to override both the equals and the hashCode methods. As both of them are utilised by Sets to differentiate duplicate or unique entries.

It is also important to note that these hash collisions hava a performance impace since multiple objects are referenced by the same Hash.

public class MyObject {
private String name;

//getter and setters


public int hashCode() {
   int hashCode = //Do some object specifc stuff to gen hashCode
   return int;
}

public boolean equals(Object obj) {
   if(this==obj) return true;
   if(obj instanceOf MyObject) {
       if(this.name.equals((MyObject)obj.getName())) {
           return true;
       }
   return false;
}
}
}

Note: Standard Java Objects such as String have already implemented hashCode and equals so you only have to do that for your own kind of Data Objects.

Thiyagu
  • 17,362
  • 5
  • 42
  • 79
dngfng
  • 1,923
  • 17
  • 34
  • 2
    Okay, cool. I read a fair amount of posts places saying that HashMaps had built-in collision handling, but I couldn't find anything that specifically stated that HashSets had built-in collision handling. – marcinx27 Oct 16 '12 at 07:18
  • Please note my edit: it is important to override both hashCode and equals methods, since both of these are utilised by Sets to identifie duplicates. – dngfng Oct 16 '12 at 07:20
  • Well you will have to implement the .equals() and hashCode() method for the object you are putting in the set. As these are evaluated by the Sets to determine wether or not it is a unique entry. – dngfng Oct 16 '12 at 07:23
  • So, if I'm putting a bunch of strings of n-length words into a hash set, do I have to check the hashCode of that, and if that hashCode matches an already existing hashCode, then check to see if they are the the same string? Edit: Essentially, I'm asking what/how I am going to implement this check. – marcinx27 Oct 16 '12 at 07:25
  • no, you don't need to fiddle with hashCode of Strings in any way - see my answer – Christian Oct 16 '12 at 07:28
2

I think you did not ask for hash collisions, right? The question is what happens when HashSet a and HashSet b are added into a single set e.g. by a.addAll(b).

The answer is a will contain all elements and no duplicates. In case of Strings this means you can count the number of equal String from the sets with a.size() before add - a.size() after add + b.size().

It does not even matter if some of the Strings have the same hash code but are not equal.

Christian
  • 13,285
  • 2
  • 32
  • 49
  • Its only true if you add String objects to the Set or other objects that have already implemented hashCode and equals. If you have your own Object you will definantly have to implement both of them. – dngfng Oct 16 '12 at 07:30
  • Kind of. What I'm saying is that if I do a.addAll(b), is it certain that I will not have any duplicates and is it certain that every unique string from a and b will be there? – marcinx27 Oct 16 '12 at 07:30
  • @dngfng So if I am using strings, I do not need to check for collisions. I can just put all of the strings into their hashSets and be certain that everything unique is in there, and that there are no duplicates? – marcinx27 Oct 16 '12 at 07:31
  • No - you never have to check for collisions, but if you have your own Object you will need to implement both equals and hashCode as otherwise it will not work properly. The actual collision detection is done for you by the HashSet by using those methods. – dngfng Oct 16 '12 at 07:33
  • @dngfng Thanks. That's what I figured, but my professor kept stressing to me that I needed to check for collisions and handle them manually. – marcinx27 Oct 16 '12 at 07:35
  • That would only make sense if he expected you to implement your own Set instead of using standard Java HashSet. – dngfng Oct 16 '12 at 07:55