2

My professor gave us this slide while explaining Hash Collision probabilities:

enter image description here

When I looked up the probabilities of two people having the same birthday in the "Birthday Paradox", I found on Wikipedia and other sources that the probability at n=10 is supposed to be 11.7. In fact every value I found and calculated myself using his formula was different from the professor's slide.

So my question is: when he asks "How many students can we hash into our table before a collision occurs," is that different from calculating the probability that any 2 students has the same birthday?

And if so, is there a formula for that?

Or was his slide was just wrong?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
xChaos
  • 41
  • 1
  • 2

4 Answers4

1

When in doubt, let's check the calculations!

The formula your instructor gave is indeed correct assuming that all outcomes are equally likely and independent of one another. Here's a little C program that prints the values for the number of collisions for small numbers of students:

#include <stdio.h>

const int kNumBuckets = 365;
const int kMaxNumber  = 50;

int main() {
  double probability = 1.0;
  for (int i = 1; i <= kMaxNumber; i++) {
    probability *= (double)(kNumBuckets - i + 1) / kNumBuckets;

    if (i % 10 == 0) {
      printf("Collision probability with %2d students: %g\n", i, 1.0 - probability);
    }
  }
  return 0;
}

Here's the output:

Collision probability with 10 students: 0.116948
Collision probability with 20 students: 0.411438
Collision probability with 30 students: 0.706316
Collision probability with 40 students: 0.891232
Collision probability with 50 students: 0.970374

These numbers don't agree with your professor, but they do agree with Wikipedia. I'm going to assume that this is just an error in your professor's materials. It might not hurt to contact them and ask for a clarification, since it's probably just an honest mistake.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 1
    Well the question asks "how many students can you enter **before** a collision occurs." So are you saying that is the same as the question of "what is the probability any N students have the same birthday?" – xChaos Jun 13 '16 at 20:52
0

I evaluated the expression your professor gave. Good eye by you: I don't get the values you posted. The ones I see are closer to the birthday problem results. You're a good student for thinking and not accepting all that you're told.

/**
 * Implement the expression in the question to check.
 * User: mduffy
 * Date: 6/14/2016
 * Time: 8:03 AM
 * @link http://stackoverflow.com/questions/37798077/how-many-students-can-you-put-into-a-hash-table-before-a-collision-occurs
 */
public class CollisionProbability {

    public static void main(String[] args) {
        int m = (args.length > 0) ? Integer.parseInt(args[0]) : 365;
        int nMin = 10;
        int nMax = (args.length > 1) ? Integer.parseInt(args[1]) : 100;
        int dn = (args.length > 2) ? Integer.parseInt(args[2]) : 10;
        for (int n = nMin; n < nMax; n += dn) {
            System.out.println(String.format("m=%d n=%d p(collide)=%f", m, n, p(m, n)));
        }
    }

    public static double p(int m, int n) {
        double p = 1.0;
        for (int i = 1; i < n; ++i) {
            p *= (double)(m-i)/m;
        }
        return 1.0-p;
    }
}
duffymo
  • 305,152
  • 44
  • 369
  • 561
0

To answer quickly without much ado:

So my question is: when he asks "How many students can we hash into our table before a collision occurs," is that different from calculating the probability that any 2 students has the same birthday?

No, it is not different. The days of year 1..365 are exactly the same has having 365 hash buckets, and an acceptable hash function contains perfectly randomized values (which is also, wrongly, assumed in the birthday problem).

And if so, is there a formula for that?

Sure, Wikipedia has it https://en.wikipedia.org/wiki/Birthday_problem .

AnoE
  • 8,048
  • 1
  • 21
  • 36
0

I think your prof has done his calculations with M = 181 or 182, i.e. half a year. Running the calculation with these values gives

181, 10, 0.22359889333483407
181, 20, 0.6636461635832673
181, 30, 0.9215808021897809
181, 40, 0.9905555232124136
182, 10, 0.2224990010873642
182, 20, 0.6615484583220019
182, 30, 0.9204086626783813
182, 40, 0.9902893472869162
Salix alba
  • 7,536
  • 2
  • 32
  • 38