13

So I have a custom class Class that will have a set of another custom class Students. So it will look something like this:

public class Class {
    private Set<Student> students;

    // other methods
}

Now I will be adding and removing many students to the set students and i will also be changing many of the private fields of a student already in the set of students.

QUESTION: What data structure should I use to best implement this? Since I will be changing the property of the Student objects in set student (thereby changing the hashcodes) should I use an ArrayList instead?

tk_
  • 16,415
  • 8
  • 80
  • 90
letter Q
  • 14,735
  • 33
  • 79
  • 118
  • 1
    How you intend to lookup the students to modify them - this will suggest how they should be stored - map vs list. – Tom Aug 01 '13 at 04:07
  • if you have a "primary key" like ID in Student and you don't change it, then you won't have prlbem, using a set – nachokk Aug 01 '13 at 04:09
  • 8
    Of all the names to give to your class, `Class` is arguably the most stupid choice possible. – Bohemian Aug 01 '13 at 04:11

9 Answers9

20

When its comes to the behavior of ArrayList and HashSet they are completely different classes.

ArrayList

  • ArrayList Does not validate duplicates.
  • get() is O(1)
  • contains() is O(n) but you have fully control over the order of the entries.

                          get  add  contains next remove(0) iterator.remove
    ArrayList             O(1) O(1) O(n)     O(1) O(1)      O(1)
    
  • Not thread safe and to make it thread safe you have to use Collections.synchronizedList(...)

HashSet

  • HashSet ensures there are no duplicates.
  • Gives you an O(1) contains() method but doesn't preserve order.

                          add      contains next     notes
    HashSet               O(1)     O(1)     O(h/n)   h is the table 
    
  • Not thread safe and to make it thread safe you have to use Collections.synchronizedSet(...)
steliosf
  • 3,669
  • 2
  • 31
  • 45
tk_
  • 16,415
  • 8
  • 80
  • 90
  • 1
    **Correction:** `ArrayList` `get()` complexity is `O(1)`, not `O(n)` – steliosf Nov 13 '17 at 20:14
  • 1
    Yes you are correct and thanks for the update. Next time please use a reference then it will be easier for peer reviewers. – tk_ Nov 14 '17 at 01:50
7

What data structure should I use to best implement this? Since I will be changing the property of the Student objects in set student (thereby changing the hashcodes) should I use an ArrayList instead?

If the hashcodes for the set elements are liable to change, then you should NOT be using a HashSet. (If you do, the data structure will break, and elements in the set are liable to go missing.)

But I doubt you should be using ArrayList either, because if hashcode() is sensitive to changes to the object, then equals(Object) will most likely be too. And that means that contains(...) and similar methods won't be able to find objects.

I think you should be using a Map type, and using a "student identifier" as the key.

(You could also override hashcode and equals so that equality means that two objects have the same id. But that makes equals(Object) useless for other purposes.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
3

If you have duplicate data in your code then you should use ArrayList otherwise you can use hashset as shown below So, if your code don't need the duplicate values then use Set instead of list because the set will give much better performance (O(n) vs O(n^2) for the list), and that's normal because avoiding duplicates is the very purpose of a set.

ArrayList

public static void main(String[] args) {

ArrayList arr =new ArrayList();
arr.add("Hello");
arr.add("is");
arr.add("Hello");
System.out.println(arr);  //As we are using Arraylist therefore 
                          //the duplicate elements are allowed therefore
                          //"Hello" is not removed in the output
    

}

HashSet

public static void main(String[] args) {

HashSet arr =new HashSet();
arr.add("Hello");
arr.add("is");
arr.add("Hello");
System.out.println(arr);  //As we are using Hashset therefore 
                          //the duplicate elements removed therefore
                          //"Hello" is removed in the output
    

    

}

Community
  • 1
  • 1
Rajat
  • 37
  • 5
2

It depends. As you are talking about student so must be there is somthing like id or rollno which is unique. If yes then override the hashcode method and implement the hashcode on the basis of their id's. Then there is no effect on the hashcode by changeing any of the other properties of student.

To chose Set or List is totaly depends upon your requirements. Read this link, and it will clarify the difference between Set and list
What is the difference between Set and List?

And if you are using objects in a Set then you can try to override both the hashcode and the equals method so that control of uniqueness is in you hands.

Community
  • 1
  • 1
Ashish Aggarwal
  • 3,018
  • 2
  • 23
  • 46
1

The javadoc for Set says

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

So if you are going to use a HashSet if you make hashCode() and equals() based with inmutable fields then you won't have this problem. For example using an unique studentID for each instance.

nachokk
  • 14,363
  • 4
  • 24
  • 53
1

From your requirement, I thought the best structure should be Map. Set actually underlying uses the Map structure inside, and you also need taking care the equals method override for better lookup. And set and arraylist find the target object need take some find algorithm so it's not so efficient as you expected (especially in the very large collection situation). Even map will waste some space, but if your ID is some kind of primitive type, you could consider the primitive type of map implementation in the Trove library.

assylias
  • 321,522
  • 82
  • 660
  • 783
Andy Song
  • 11
  • 1
  • (I would not recommend looking at Trove for a "homework" problem like this. The memory usage "problem" is not relevant here. And, if this was a real application, it is highly unlikely you would keep the student database in memory in the first place.) – Stephen C Aug 01 '13 at 04:37
1

QUESTION: What data structure should I use to best implement this? Since I will be changing the property of the Student objects in set student (thereby changing the hashcodes) should I use an ArrayList instead?

Definitely if you are gonna to change values used by hashCode or equals it is not possible to use HashMap or HashSet.

You are saying that you want to remove and add a lot. The question is if you want to do it sequntially or randomly(based on index). If you add, remove sequentially then definitely the best choice is LinkedList. If you access objects randomly then ArrayList is much more efficient.

Volodymyr Levytskyi
  • 3,364
  • 9
  • 46
  • 83
0

For a hashed collection such as HashSet, the key should be immutable. Hashset uses hashing internally to decide the bucket to store the object. And also while retrieving the object it will use hash to find the object bucket. If you are changing the object after storing, it may change the hashcode of the object and Set may not be able to retrieve the correct object. If you need to change the object even after adding it to the collection then using a hashed collection is not a good choice. Rather go for Arraylist, but note that with ArrayList you will lose the advantage to retrieve the desired Student quickly, as it could be with a Set.

Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
  • How do you retrieve a Student quickly being contained in a Set? – morgano Aug 01 '13 at 04:06
  • HashSet uses hashing internally, which makes the retrieval of objects quicker. – Juned Ahsan Aug 01 '13 at 04:07
  • Can you please add an example of how to retrieve a specific Student from a Set? I'm sincerely curious about it, if that's true then I've been underestimating Sets all this time – morgano Aug 01 '13 at 04:11
  • @morgano Don't confuse the `Set` interface with the performance of `HashSet`, which is really just a `HashMap` that doesn't care about its values. Read about [hash tables](http://en.wikipedia.org/wiki/Hashtable) to understand how the data structure itself can perform well. – Paul Bellora Aug 01 '13 at 04:55
  • @PaulBellora I know a HashSet works with an internal HashMap, but that doesn't change the fact that to retrieve a Student from a Set I'd have to iterate over the Set anyway (same as in an ArrayList) the "advantages" of the internal HashMap are not being used at all. I sincerely though I was missing something about Sets, but this answer still doesn't mention clearly why you could retrieve an object quickier with a HashSet than with an ArrayList. – morgano Aug 01 '13 at 05:08
  • @morgano It sounds like you're making assumptions about the implementation of `HashSet` that aren't true. I recommend checking out the [source on grepcode](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashSet.java#HashSet.contains%28java.lang.Object%29). – Paul Bellora Aug 01 '13 at 05:10
  • @morgano HashSet or any hashed collection uses hashing to store and retreive objects. It means, the hashcode will be called to store and retrieve the object. When JVM tries to retrieve the object, it will simply get the location of the bucket where it is stored using the hashcode. This retrieval is of order O(1), provided hashcode is properly implemented to generate unique hashcode for different objects. – Juned Ahsan Aug 01 '13 at 05:11
  • @PaulBellora go to your JDK dir and unzip the source code (or google) and have a look at /java/util/HashSet.java, line 93 (java 7) – morgano Aug 01 '13 at 05:18
  • @JunedAhsan It's simple, just state an example where you retrieve an object from a HashSet (if you do it you'll have my upvote). A Set doesn't have any methods to retrieve a specific element, at much it removes or tells you that an object (or an object "equals" to the first object) is contained in a Set, to "retrieve" an object from a Set you need to iterate over it (and iteration doesn't take any advantage of hashing) – morgano Aug 01 '13 at 05:24
  • @morgano I'm not going to do that because it's a tiresome way to communicate, sorry. Unlike `HashSet` lookups, which are a super-efficient O(1) with proper hashing, I promise. – Paul Bellora Aug 01 '13 at 06:38
  • @PaulBellora I'm quite sure you already did it and found your error. In summary: HashSet lookups you refer to only take place in removal and to return a boolean to verify that the object is in the Set. You can't retrieve anything straight from the Set. – morgano Aug 01 '13 at 06:48
  • @morgano There's no "retrieving" from a `Set` - it either contains the key you already have or it doesn't. Unless you're talking about iterating the set and finding the *original instance* using `equals`, in which case that's a super rare use case and I wish you'd spelled that out hours ago. – Paul Bellora Aug 01 '13 at 06:54
  • @PaulBellora check the last part of the answer: "... but note that with ArrayList you will lose the advantage to retrieve the desired Student quickly, as it could be with a Set". Good think we at last agree on something: You can't retrieve a Student from a Set unless you iterate over it. There is no such "advantage" to lose. – morgano Aug 01 '13 at 07:01
0

You should not use a Set when the results of objects' equals methods will change. If you're identifying students by a stable unique ID number, and equals just checks that ID, then using a Set is fine.

Note that HashSet will use hashCode for indexing and comparison, and hashCode should incorporate exactly those fields that are used to determine equals.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152