How can quadratic probing fail to find a location on the next insertion while linear probing always finds one?

Question

I am doing a practice question from Data Structures Practice

The Question
   1.Linear Probing will (circle one):
        i.gradually degrade in performance as more values are inserted
       ii.May not find a location on the next insertion
       iii.None of the above

   2.Quadratic Probing will (circle one):
        i.gradually degrade in performance as more values are inserted
       ii.May not find a location on the next insertion
       iii.None of the above

According to the answer key(from the link), the answer to 1 is i and the answer to 2 is ii.

I agree with the answer question 1. Linear probing will explore all possibilities and wrap to the beginning of the hashtable if needed. Therefore it will find a location on the next insertion. If you insert a bunch of values that map to the same bucket or near one bucket, clustering will result and performance will degrade.
I understand why the the answer to question 2 isn't i. Quadratic increment probs by different increments to avoid the clustering issue. However can some explain the intution behind how quadratic probing "may not find a location on the next insertion"
A quadratic probing function is defined with (from Quadratic Probing)
nth probe being ((h(k) + n²) mod TableSize) until the probe hits a zero(unoccupied)

From what I learned in my other question Quadratic Probing, it is possible for a quadratic probe to hit every bucket. Like the linear probe, the quadratic probe should also wrap tot he beginning of the hashtable if needed. Why then in this question, can a quadratic probe not find a location on the next insertion while a linear probe can?

You say when a "zero (unoccupied)", but couldn't it be the table has no unoccupied openings. Although quadratic probing avoids primary clustering, it creates secondary clustering. — ChiefTwoPencils, Mar 15 '15 at 05:01
But secondary clustering wouldn't degrade performance as much as primary clustering. I think that's what the question meant. — committedandroider, Mar 15 '15 at 05:11
This explains it reasonably well. https://en.wikipedia.org/wiki/Quadratic_probing#Limitations — Jim Mischel, Jun 26 '16 at 03:11

mcdowella · Answer 1 · 2015-03-15T07:09:14.363

3

To find out if h(k) + n^2 searches all possibilities you need to find out if n^2 takes up all possible values mod the hash table size - say N. So you need to know if by choosing all the N possibilities for n you can have n^2 take up all the N possible different values.

(-n)^2 = n^2 so here are different input values to the square function that produce the same output values. So it is not possible to produce all the N different output values, because there are collisions between the results of the different input values.

Example - work mod 7. 1^2 = 6^2 = 1. 2^2 = 5^2 = 4. 3^2 = 4^2 = 2. 7^2 = 0. So if you square the inputs (0, 1, 2, 3, 4, 5, 6) you get the outputs (0, 1, 4, 2, 2, 4, 1) and you cannot produce 3, 5, or 6 - in fact this example is enough to show that you cannot guarantee to search all possible slots, but the math above is neat enough to be more reliable than my arithmetic, as well as showing that the behaviour here is pretty general.

edited Mar 15 '15 at 07:09

answered Mar 15 '15 at 06:21

mcdowella

19,301
2
19
25

If you cannot guarantee to search all possible slots, would the big oh runtime still be O(N)? – committedandroider Mar 15 '15 at 06:28
I think in most cases it would take you O(N) to search all accessible slots, but we are now talking about the cost of code that should never be executed or even written (increase the size of the hash table instead after the first few probes fail, or use some sort of secondary chaining or something) so the question isn't very well defined. – mcdowella Mar 15 '15 at 07:01
I asked my instructor to clarify and she said that we can assume the array size is prime and we rehash when the load factor reaches 0.5 so all spots can be hit. I guess its O(N) in the worst case as well? – committedandroider Mar 16 '15 at 00:08
If the array size is prime then the n^2 values split up nicely into n and -n except for 0 and so attain (N+1)/2 different values. If everything hashes to the same value I guess you just fill the achievable slots before rehashing and the cost for the last insert is O(N). (We know the solutions to a^2 = b^2 are the solutions to (a-b)(a+b) = 0 mod N and since N is prime either a = b or a = -b). – mcdowella Mar 16 '15 at 05:31

score 3 · Answer 2 · answered Jun 25 '16 at 23:31

I think the question is about in what case quadratic probing will not be able to find the next location at all in case there is a collision.

With the below example, I do see that quadratic probing is not able to find a location in case of collisions with same resultant key.

Let's say the hash table size is 7.

Here are the numbers to be inserted 23, 39, 9, 16, 30.

h(k, i) = [h(k) + sqr(i)] mod 7 where h(k) = k mod 7.

for i = 0, h(k,0) = h(k)
for i = 1, h(k,1) = [h(k) + 1] mod 7
for i = 2, h(k,2) = [h(k) + 4] mod 7
for i = 3, h(k,3) = [h(k) + 9] mod 7

23 --> 23 % 7 = 2

39 --> 39 % 7 = 4

9  --> 9 % 7 = 2 <--- Collision
   2 + 1 = 3

16 --> 16 % 7 = 2 <--- Collision
   2 + 1 = 3 <--- Collision
   2 + 4 = 6

30 --> 30 % 7 = 2 <--- Collision

   2 + 1 = 3 <--- Collision

   2 + 4 = 6 <--- Collision
   2 + 9 = 4 <--- Collision
   2 + 16 = 4 <--- Collision
   2 + 25 = 6 <--- Collision
   2 + 36 = 3 <--- Collision
   2 + 49 = 2 <--- Collision
   2 + 64 = 3 <--- Collision
   2 + 81 = 6 <--- Collision
   2 + 100 = 4 <--- Collision
   2 + 121 = 4 <--- Collision
   2 + 144 = 6 <--- Collision
   2 + 169 = 3 <--- Collision
   2 + 196 = 2 <--- Collision
   2 + 225 = 3 <--- Collision
   2 + 256 = 6 <--- Collision
   2 + 289 = 4 <--- Collision
   2 + 324 = 4 <--- Collision
   2 + 361 = 6 <--- Collision
   2 + 400 = 3 <--- Collision

This would be the case with any key k with (k mod size) equal to 2 (like 37, 44, 51, 58, etc.

score 1 · Accepted Answer · answered Mar 15 '15 at 06:10

1

There must be a proof for this out there somewhere. But, I don't see how quadratic probing could hit every bucket in most cases.

Let's say the table size is 7, and h(k) is 0. For the ith iteration, probe = i^2 mod 7. I tested all i less than 10,000 and this always evaluates to 0, 1, 2, or 4 for any i. Buckets 3, 5, and 6 will never be probed.

Here's the script I used:

var hash_of_k = 0;
var table_size = 7;
var iteration_limit = 10000;
var buckets =  new Object();

//Probe buckets
for(var i=0; i<iteration_limit; i++){
  b = (hash_of_k+(i*i)) % table_size;
  buckets[b] = 1;
}

//Report which buckets were probed.
var buckets_probed = '';
for(b in buckets){
  buckets_probed += b + '  ';
}

alert(buckets_probed);

You could set the iteration limit higher, but that just doesn't seem practical. It seems like the whole point of quadratic probing would be to find an empty bucket faster than linear probing.

answered Mar 15 '15 at 06:10

Seamus

4,539
2
32
42

What would the big oh runtime be if this doesn't hit every bucket? – committedandroider Mar 15 '15 at 06:18
That's really good question. Ideally you'd want to know for any given table size, what's the maximum number of iterations before you know that you won't find an empty bucket. It seems like the runtime would scale linearly with the size of the table, and with the number of occupied buckets. But since you don't probe all the buckets, the maximum runtime would be less than that of a linear probe. Just playing with the script, it seems like it usually probes about half the buckets. So maybe O(1/2 x Table Size x Buckets Occupied)? – Seamus Mar 15 '15 at 06:56
You've discovered a well known limitation of quadratic probing. https://en.wikipedia.org/wiki/Quadratic_probing#Limitations – Jim Mischel Jun 26 '16 at 03:12

How can quadratic probing fail to find a location on the next insertion while linear probing always finds one?

3 Answers3