0

I understand that hash tables are designed to have easy sorting and retrieval of data when storing massive amounts of them. However, when retrieving a specific piece of data, how do they retrieve it if they were stored in an alternative location due to collision?

Say there are 10 indexes and data A was stored in index 3 and data E runs into collision because data A is stored in index 3 already and collision prevention puts it in index 7 instead. When it comes time to retrieve data E, how does it retrieve E instead of using the first hash function and retrieving A instead?

Sorry if this is dumb question. I'm still somewhat new to programming.

btrballin
  • 1,420
  • 3
  • 25
  • 41
  • With a hastable each bucket has a list of values. The hash function goes to the head element and then the list is iterated. – BevynQ Mar 15 '16 at 04:09
  • https://en.wikipedia.org/wiki/Hash_table – Oleg Estekhin Mar 15 '16 at 04:11
  • Back in college (late 1970s) I learned about hash tables in which the algorithm finds another bucket if the first bucket is occupied. I have _never_ seen them used. Chaining (where every bucket has a set of values, not just one, and you just add a new value to the set if there's a "collision") is the standard way to do things, and you have to have an very good reason to do it any other way. – ajb Mar 15 '16 at 04:26
  • Hash tables have nothing to do with sorting. – user207421 Mar 15 '16 at 04:59
  • I know they aren't sorted. I meant to say they are placed inside the hash table in a systematic way. So in a way they are being sorted, but not through the literal definition of sorting in programming. – btrballin Mar 15 '16 at 06:43
  • So they aren't being sorted, and you know that, so why say it? – user207421 Mar 15 '16 at 09:40

1 Answers1

1

I don't believe that Java will resolve a hashing collision by moving an item to a different bucket. Doing that would make it difficult if not impossible to determine the correct bucket into which it should have been hashed. If you read this SO article carefully, you will note that it points out two tools which Java has at its disposal. First, it maintains a list of values for each bucket* (read note below). Second, if the list becomes too large it can increase the number of buckets.

I believe that the list has now been replaced with a tree. This will ensure O(n*lgn) performance for lookup, insertion, etc., in the worst case, whereas with a list the worst case performance was O(n).

Community
  • 1
  • 1
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Tim has give good explanation. You should accept this answer. I think the brbalin got confused by Java8 statement resolve collision by switching from list to tree after certain threshold. – Raghu K Nair Mar 15 '16 at 04:16
  • What if collision prevention is done via double hash function or probing? – btrballin Mar 15 '16 at 04:46
  • Sure this is possible but I don't think Java collections does this. – Tim Biegeleisen Mar 15 '16 at 04:47
  • I know it doesn't, but I'm wondering how it would handle retrieving if double hashing or probing was done. – btrballin Mar 15 '16 at 06:43
  • @btrballin I think the Wikipedia article on hash tables explains how it's done. In all probability, very few people here would know the answer without looking it up in Wikipedia anyway. Those kinds of hash tables just aren't used. – ajb Mar 15 '16 at 06:53
  • ajb: closed hashing implementations most certainly are used in mainstream computing (e.g. python dicts, and indeed things like classes - with their member functions/variables/types - are all in such hash tables); and - when I worked there and presumably still - we used closed hashing for the tables at Bloomberg storing financial tick data from all around the world. Anyway @btrballin - the answers simply consistency: if the insertion routine uses some algo to consider alternative buckets, the same algo should be used by the find, erase etc. operations. – Tony Delroy Mar 25 '16 at 05:31