2

Suppose I have a hash set of request IDs that I've sent from a client to a server. The server's response returns the request ID that I sent, which I can then remove from the hash set. This will be run in a multithreaded fashion, so multiple threads can be adding to and removing IDs from the hash set. However, since the IDs generated are unique (from a thread safe source, let's say an AtomicInteger for now that gets updated for each new request), does the HashSet need to be a ConcurrentHashSet?

I would think the only case this might cause a problem would be if the HashSet encounters collisions which may require datastructure changes to the underlying HashSet object, but it doesn't seem like this would occur in this use case.

Roman C
  • 49,761
  • 33
  • 66
  • 176
Marcus
  • 2,128
  • 20
  • 22
  • The HashSet implementation needs to support concurrency (like `ConcurrentHashSet`) or be wrapped by `Collections.synchronizedSet`. – Drew MacInnis Aug 18 '13 at 19:10

1 Answers1

3

Yes. Since the underlying array for the hash table might need to be resized for instance and also because of course IDs can collide. So having different keys will not help at all.

However, since you know that the IDs are increasing, and if you can have an upper bound on the maximum number of IDs outstanding (lets say 1000). You can work with an upper and lower bound and a fixed size array with offset indexing from the lowest key, in which case you will not need any mutexes or concurrent data structure. Such data structure is very fragile however since if you have more than your upper bound oustanding hell will break loose. So unless performance is of concern, just use the ConcurrentHashSet.

Janick Bernet
  • 20,544
  • 2
  • 29
  • 55
  • But I'm not sure if the keys would collide for something like an integer that is unique. If it was a string, and hashes of strings collided, I would agree. But I can't see it happening for an integer, especially if the HashSet was presized. I think if the initial size and load factor are properly set, there wouldn't be collisions if you already know the range of unique values over a relatively small range (0-(2^30-1)). I am considering performance as a concern, which is why I'm exploring this. – Marcus Aug 19 '13 at 20:09
  • I don't see how it makes any difference whether those are integers or hashes on strings in regards to the likelihood of collisions. A good hash table should spread input data either way, but there will be collisions eventually and with too high probability for you to ignore. – Janick Bernet Aug 20 '13 at 13:32
  • `hashCode()` produces an integer. Suppose I create a class `MyInteger` for my `HashSet` where `MyInteger` has the same hash code as the integer value. Then by definition, there will be no hash collisions. I can't easily do the same thing for an arbitrary length string, as that will be guaranteed to eventually have collisions, but the `MyInteger` class is guaranteed not to. – Marcus Aug 21 '13 at 15:47
  • Yes, the hashcode will not collide, which is what I argued should rarely happen even with strings anyway. However, the *buckets* will collide, and that's all that matters. Maybe the answer here helps to clarify what is the issue: http://stackoverflow.com/a/10879475/369310 or here: http://stackoverflow.com/questions/6493605/how-does-java-hashmap-work – Janick Bernet Aug 22 '13 at 10:48
  • So buckets wouldn't collide if the initial hash set capacity was equal to the maximum expected value (and, as previously mentioned, the hashCodes are generated as mentioned above). That said, if you know all that, you might as well just have a static fixed array instead of a `HashSet`. – Marcus Aug 30 '13 at 20:56