2

This is a homework question, but I think there's something missing from it. It asks:

Provide a sequence of m keys to fill a hash table implemented with linear probing, such that the time to fill it is minimum.

And then

Provide another sequence of m keys, but such that the time fill it is maximum. Repeat these two questions if the hash table implements quadratic probing

I can only assume that the hash table has size m, both because it's the only number given and because we have been using that letter to address a hash table size before when describing the load factor. But I can't think of any sequence to do the first without knowing the hash function that hashes the sequence into the table.

If it is a bad hash function, such that, for instance, it hashes every entry to the same index, then both the minimum and maximum time to fill it will take O(n) time, regardless of what the sequence looks like. And in the average case, where I assume the hash function is OK, how am I supposed to know how long it will take for that hash function to fill the table?

Aren't these questions linked to the hash function stronger than they are to the sequence that is hashed?

As for the second question, I can assume that, regardless of the hash function, a sequence of size m with the same key repeated m-times will provide the maximum time, because it will cause linear probing from the second entry on. I think that will take O(n) time. Is that correct?

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
Heathcliff
  • 3,048
  • 4
  • 25
  • 44

2 Answers2

1

This questions doesn't sound terribly concerned with the hash function, but it would be nice to have. You seem to pretty much get it, though. It sounds to me like the question is more concerned with "do you know what a worst-case list of keys would be?" than "do you know how to exploit bad hash functions?"

Obviously, if you come up with a sequence where all the entries hash to different locations, then you have O(1) insertions for O(m) time in total.

For what you are saying about hashing all the keys to the same location, each insertion should take O(n) if that's what you are suggesting. However, that's not the total time for inserting all the elements. Also, you might want to consider not literally using the same key over and over but rather using keys that would produce the same location in the table. I think, by convention, inserting the same key should cause a replacement, though I'm not 100% sure.

I'll apologize in advance if I gave too much information or left anything unclear. This question seems pretty cut-and-dried save the part about not actually knowing the hash function, and it was kind of hard to really say much without answering the whole question.

Dylan M.
  • 116
  • 1
  • 7
  • 1
    there is robin hood hashing, that replaces existing elements based on their probe count. bUt in general you don't replace existing elements while inserting new elements. ( Not considering reshaping the hash table, it is just another topic) – Seçkin Savaşçı Jul 02 '12 at 17:54
  • Ok, thanks for the info. Still, it doesn't make so much sense to have multiple values on the same key in a hash table since you won't be able to retrieve one of the values or you won't know if you're getting the right one. – Dylan M. Jul 02 '12 at 18:14
  • You misunderstand the situation. You never have 2 values in a cell. Late comer searches for another place, based on probing scheme. And you validate the cell content while retrieving it. So you can know that you are getting the needed data. – Seçkin Savaşçı Jul 02 '12 at 18:17
  • I understand. What I'm saying is the if you insert and element with a key of 5, and then you insert another element with a key of 5, the user does not know which value they will get when they try to get the value associated with the key 5. I know how linear probing works, haha. – Dylan M. Jul 02 '12 at 18:20
  • He will know. Because we are agnostic to hash function's candidate for storing the data. For populating the table: we give a value and a hash index, hash index is given to hash function as an input. Hash function outputs an index in table. We probe the address of the index value. If it is empty, we place our value with the hash index together. So while retrieving, we can check that the query hash index and the stored hash index in the cell is the same. – Seçkin Savaşçı Jul 02 '12 at 18:22
  • I understand how linear probing works, but what I'm saying is not a matter of collision handling. Suppose I have a table of phone numbers, and I map 5555555555 to "Bob" (so "Bob" is my key), and then I have someone else named "Bob" who I map 9999999999 to. Now I have a duplicate key in my table, and later on I find I need to give Bob a call. So I use the key "Bob" to retrieve the phone number I want, only it might happen to be the wrong one because there are multiple numbers on that key. I understand that this example might be a little far fetched, but I believe it illustrates my point. – Dylan M. Jul 02 '12 at 18:40
  • same key points to same cell location,you can't have duplicate keys pointing different locations(simply against the concept). so insertion "Bob" again will overwrite the data 555555555 to 99999999. But simply in that situation you check if the cell is empty before writing to it. – Seçkin Savaşçı Jul 02 '12 at 21:03
  • 1
    I think there has been a misunderstanding. I said that I assumed you replaced elements when inserting duplicate keys, then you called me out saying that you don't. And now, you're saying I'm wrong by agreeing with me. Maybe I'm just not very good at communicating, but I'm done having this discussion because one or both of us seems to be having a very hard time understanding the other. – Dylan M. Jul 02 '12 at 22:05
1

Well, the idea behind these questions is to test your understanding of probing styles. For linear probing, if a collision occurs, you simply test the next cell. And it goes on like this until you find an available cell to store your data. Your hash table doesn't need to be size m but it needs to be at least size m.

First question is asking that if you have a perfect hash function, what is the complexity of populating the table. Perfect hashing function addresses each element without collision. So for each element in m, you need O(1) time. Total complexity is O(m).

Second question is asking for the case that hash(X)=cell(0), which all of the elements will search till the first empty cell(just rear of the currently populated table).

For the first element, you probe once -> O(1)

For the second element, you probe twice -> O(2)

for the nth element, you probe n times -> O(n)

overall you have m elements, so -> O(n*(n+1)/2)

For quadratic probing, you have the same strategy. The minimum case is the same, but the maximum case will have O(nlogn). ( I didn't solve it, just it's my educated guess.)

Seçkin Savaşçı
  • 3,446
  • 2
  • 23
  • 39