-1

I want to generate unique and secure strings randomly and I want to do it over time. I want to choose the shortest possible length for my use (Big enough so that random tries fail with good chance). I Also want the process to be fast. Using the following code and testing it for uniqueness, duplicates occur sooner than I expected. I'm not sure if there is any problem.

P.S.: Is it secure to use a pool for generated numbers?

P.P.S: I prefer not to add '-' and '_' to the alphabet. Is it worth removing it?

string Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
string Random(int length,string alphabet)
    {
        StringBuilder result = new StringBuilder();
        using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
        {
            for (int i = 0; i < length; i++)
            {
                byte[] buffer = new byte[sizeof(uint)];
                rng.GetBytes(buffer);
                uint num = BitConverter.ToUInt32(buffer, 0);
                result.Append(alphabet[(int)(num % (uint)alphabet.Length)]);
            }
        }

        return result.ToString();
    }
Ali Sh
  • 117
  • 1
  • 7
  • 1
    Possible duplicate of [Random number generator only generating one random number](https://stackoverflow.com/questions/767999/random-number-generator-only-generating-one-random-number) – Lucifer Jun 18 '18 at 07:21
  • Maybe it might help to run a spell check on your code. Looking at it, the code will not run at all. – t0mm13b Jun 18 '18 at 07:22
  • @t0mm13b I was chrome spell check showing me I misspelled "length" and I changed it during the posting. code is correct. and I edited the post. – Ali Sh Jun 18 '18 at 07:35
  • @Lucifer The answer in the mentioned post, said to use one instance of Random. but for my application it is not possible. – Ali Sh Jun 18 '18 at 07:37
  • 3
    `duplicates occur sooner than I expected.` When did you expect them? When did they occur? – mjwills Jun 18 '18 at 07:37
  • 2
    This feels like a XY Problem - https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem . _Why_ do you want to do this? – mjwills Jun 18 '18 at 07:38
  • @mjwills after only less than 1% of all possible strings – Ali Sh Jun 18 '18 at 07:41
  • You like to live a dangerous life. if `alphabet.Length` isn't a divisor of 256 you'll get some letters to be more common than others. To make a clear example: if `alphabet.Length == 129`, then the first 127 letters have two values that are corresponding (alphabet[0] == 0 and 129. alphabet[1] == 1 and 130 and so on), while the last two letters (alphabet[127] and alphabet[128]) have a single value each. – xanatos Jun 18 '18 at 07:49
  • @mjwills I am going to include it in a RandomUtility class and use it for (file names, tokens for a custom protocol, custom unique ids) – Ali Sh Jun 18 '18 at 07:53
  • 2
    "Short" and "secure" are two things that generally don't go well together. Anything that's sufficiently short can also be brute forced and/or predicted easily. What's more, you don't want things to be secure by having a name that's hard to guess, you want them to be secure by not being accessible at all. If you use 16 random bytes and convert them to a 32 character hexstring and be done with it, you'll have 128 bits worth of randomness. That's certainly good enough to prevent collisions. So is `Guid.NewGuid()` for that matter, though that's not cryptographically secure. – Jeroen Mostert Jun 18 '18 at 08:07
  • @JeroenMostert noted! I will change my strategies. And by making the strings longer I get enough unique strings. BUT... the question in the corner of my mind, why this function is generates duplicates so soon (`after only less than 1% of all possible strings generated` )? – Ali Sh Jun 18 '18 at 08:24
  • Have you read xanatos' https://en.wikipedia.org/wiki/Birthday_problem link @AliSh? – mjwills Jun 18 '18 at 09:38

1 Answers1

0

In general, in your Alphabet each character has 6 bits of "information" (there are 64 characters). So a string of length = 10 has 60 bits of information. For the birthday paradox after sqrt(2^60), that is 2^30, you should have a 0.5 possibility (50%) of a collision. 2^30 is quite big, but if the number of characters is lower then the collision will happen much earlier.

Note that there is a possible optimization that can be done in your code (preallocating the "correct" length of StringBuilder) and there is a bug:

public const string BaseAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

public static string Random(int length, string alphabet = BaseAlphabet)
{
    StringBuilder result = new StringBuilder(length);
    uint maxValue = (uint)((((ulong)uint.MaxValue) + 1) / (uint)alphabet.Length * (uint)alphabet.Length);

    using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
    {
        for (int i = 0; i < length; i++)
        {
            uint num;

            do
            {
                byte[] buffer = new byte[sizeof(uint)];
                rng.GetBytes(buffer);
                num = BitConverter.ToUInt32(buffer, 0);
            }
            while (num >= maxValue);

            result.Append(alphabet[(int)(num % (uint)alphabet.Length)]);
        }
    }

    return result.ToString();
}

Some characters could be more probable than others in the code you wrote. This because if (uint.MaxValue + 1) (the whole range of possible numbers generated by the rng) isn't perfectly divisible by alphabet.Length then the characters at the end of alphabet are less probable than the characters at the beginning.

xanatos
  • 109,618
  • 12
  • 197
  • 280