Lets say your corpus is the collection of alpha numberic letters. a-zA-Z0-9
.
char[] corpus = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
We can use SecureRandom
to generate a seed, which will ask the OS for entropy, depending on the os. The trick here is to keep a uniform distribution, each byte has 255 values, but we only need around 62 so I will propose rejection sampling.
int generated = 0;
int desired=6;
char[] result= new char[desired];
while(generated<desired){
byte[] ran = SecureRandom.getSeed(desired);
for(byte b: ran){
if(b>=0&&b<corpus.length){
result[generated] = corpus[b];
generated+=1;
if(generated==desired) break;
}
}
}
Improvements could include, smarter wrapping of generated values.
When can we expect a repeat? Lets stick with the corpus of 62 and assume that the distribution is completely random. In that case we have the birthday problem. That gives us N = 62^6 possiblities. We want to find n where the chance of a repeat around 10%.
p(r)= 1 - N!/(N^n (N-n)!)
And using the approximation given in the wikipedia page.
n = sqrt(-ln(0.9)2N)
Which gives us about 109000 numbers for 10% chance. For a 0.1% chance it woul take about 10000 numbers.