Questions tagged [birthday-paradox]

The Birthday Paradox is a phenomenon in probability in which the probability of a population containing two individuals with the same property is much greater than would be intuitively expected. In its original form it describes the likelihood that any two individuals in a room share a birthday. Amongst other things, the Birthday Paradox affects cryptography, hashing and various applications of random number generators.

The Birthday Paradox is a phenomenon where two individual samples selected from a population may have the same value, for example the birthdays of individuals in a room. The paradox is that the probability of this coincidence happening is much higher than people expect. Amongst other things the phenomenon affects hashing, cryptography and various applications of random number generators.

Overview

The problem is originally defined as the probability of any two people in the room sharing the same birthday. The key point is that any two people in the room could share a birthday. People tend to naively misinterpret the problem as the probability of someone in the room sharing a birthday with a specific individual, which is the source of the cognitive bias that often causes people to underestimate the probability. This is the incorrect assumption – there is no requirement for the match to be to a specific individual and any two individuals could match.

This graph shows the probability of a shared birthday as number of people in the room increases. For 23 people, the probability of two sharing a birthday is just over 50%.

The probability of a match occurring between any two individuals is much higher than the probability of a match to a specific individual as the match does not have to be to a specific date. Rather, you only have to find two individuals that share the same birthday. From this graph (which can be found on the wikipedia page on the subject), we can see that we only need 23 people in the room for there to be a 50% chance of finding two that match in this way.

Some applications

  • Cryptographic hashing relies on the low probability of any two items having the same hash value. In order to achieve this the hash value must be very large so as to make it computationally infeasible to find another item that hashes to the same value. The Birthday paradox means that the width of the hash values must be very large in order for the probability of a collision to be sufficiently low.

  • In a Hash Table, the Birthday Paradox makes the likelihood of collisions quite high unless you have a perfect hash function that hashes a controlled set of source values to a unique set of hash values. In other cases, the hash table must be able to deal with relatively frequent collisions.

  • If you need a guaranteed unique sequence of random numbers, you cannot rely on simply generating random numbers, as the probability of a collision is quite high for relatively small sample sets. Instead, a set of unique numbers must be generated (perhaps just sequentially) and shuffled into a random order.

The tag is most applicable to questions about applications directly affected by the phenomenon (e.g. cryptographic hashing), or perhaps re-tagging a question where the poster has displayed obvious naievity about it.

57 questions
82
votes
11 answers

Random is barely random at all?

I did this to test the randomness of randint: >>> from random import randint >>> >>> uniques = [] >>> for i in range(4500): # You can see I was optimistic. ... x = randint(500, 5000) ... if x in uniques: ... raise Exception('We…
orokusaki
  • 55,146
  • 59
  • 179
  • 257
25
votes
1 answer

Probability of hash collision

I am looking for some precise math on the likelihood of collisions for MD5, SHA1, and SHA256 based on the birthday paradox. I am looking for something like a graph that says "If you have 10^8 keys, this is the probability. If you have 10^13 keys,…
Dark Nebula
  • 403
  • 1
  • 4
  • 6
25
votes
4 answers

Examples of Hash-Collisions?

For demonstration-purposes, what are a couple examples of strings that collide when hashed? MD5 is a relatively standard hashing-option, so this will be sufficient.
Sampson
  • 265,109
  • 74
  • 539
  • 565
18
votes
3 answers

How was there no collision among 50,000 random 7-digit hex strings? (The Birthday Problem)

I've encountered some code that generates a number of UUIDs via UUID.randomUUID(), takes the last 7 digits of each (recent versions of UUID are uniformly distributed in terms of entropy), and uses that as a key to insert rows into a database. I…
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
11
votes
5 answers

How random is Random.Next()?

I have been doing some testing on the Random class and I have used the following code: while (x++ <= 5000000) { y = rnd.Next(1, 5000000); if (!data.Contains(y)) data.Add(y); else { Console.WriteLine("Cycle {2}:…
Bhaskar
  • 10,537
  • 6
  • 53
  • 64
7
votes
5 answers

Uniquely identifying URLs with one 64-bit number

This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what kind of collision frequency should I expect? How does the answer change if…
itsadok
  • 28,822
  • 30
  • 126
  • 171
6
votes
1 answer

java random string generation and birthday paradox

i need to write a random string generation class which generates 7char strings from a 31-char charset of numbers and some alphabets (10+26-5 , 5 vowels omitted). simple maths gives a set of 31^7 possible combinations ~ 27.5 billion. i have questions…
Ashish Thukral
  • 1,445
  • 1
  • 16
  • 26
5
votes
1 answer

Can someone please clarify the Birthday Effect for me?

Please help interpret the Birthday effect as described in Wikipedia: A birthday attack works as follows: Pick any message m and compute h(m). Update list L. Check if h(m) is in the list L. if (h(m),m) is already in L, a colliding message pair…
Mark
  • 1,214
  • 10
  • 24
4
votes
3 answers

Why am I getting dups with random.shuffle in Python?

For a list of 10 ints, there are 10! possible orders or permutations. Why does random.shuffle give duplicates after only 5000 tries? >>> L = range(10) >>> rL = list() >>> for i in range(5000): ... random.shuffle(L) ... rL.append(L[:]) ...…
telliott99
  • 7,762
  • 4
  • 26
  • 26
4
votes
2 answers

Birthday Paradox List is nonetype

I'm trying to solve the Birthday Paradox with Python. I'm close but the last piece has me at a loss. I'm using random to generate a list of numbers given a range and number of items to create. That works. I then check to see if a list (generated…
h-man
  • 83
  • 5
3
votes
3 answers

Why isn't this Random number generation code working?

I'm writing a program for proving the 'Birthday Paradox'. For i = 0 To (pnum - 1) days(i) = rnd(h:=365) Next It generates a random number for every i (days(i)) between 1 and 365, the function is: Private Function rnd(h As…
Dave12311
  • 29
  • 5
2
votes
2 answers

Random token generation - a supposedly unlikely collision occurred

A couple months ago, we were using UUIDs to generate random string IDs that needed to be unique across the board. I then changed the algorithm in order to save some data and index space in our database. I tested a few ways to generate unique string…
Jeff
  • 165
  • 11
2
votes
3 answers

Birthday Paradox: How to programmatically estimate the probability of 3, and N, people sharing a birthday

There are extensive resources on the internet discussing the famous Birthday Paradox. It is clear to me how you calculate the probability of two people sharing a birthday i.e. P(same) = 1 - P(different). However if I ask myself something apparently…
user425727
2
votes
4 answers

How many students can you put into a hash table before a collision occurs?

My professor gave us this slide while explaining Hash Collision probabilities: When I looked up the probabilities of two people having the same birthday in the "Birthday Paradox", I found on Wikipedia and other sources that the probability at n=10…
xChaos
  • 41
  • 1
  • 2
2
votes
2 answers

Why am I getting this error in my JAVA code?

The birthday paradox says that the probability that two people in a room will have the same birthday is more than half as long as the number of people in the room (n), is more than 23. This property is not really a paradox, but many people…
Chrismar
  • 93
  • 1
  • 9
1
2 3 4