5

In this answer, the below code was posted for creating unique random alphanumeric strings. Could someone clarify for me how exactly they are ensured to be unique in this code and to what extent these are unique? If I rerun this method on different occasions would I still get unique strings?

Or did I just misunderstand the reply and these are not generating unique keys at all, only random?

I already asked this in a comment to that answer but the user seems to be inactive.

    public static string GetUniqueKey()
    {
        int maxSize = 8;
        char[] chars = new char[62];
        string a;
        a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
        chars = a.ToCharArray();
        int size = maxSize;
        byte[] data = new byte[1];
        RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
        crypto.GetNonZeroBytes(data);
        size = maxSize;
        data = new byte[size];
        crypto.GetNonZeroBytes(data);
        StringBuilder result = new StringBuilder(size);
        foreach (byte b in data)
        { result.Append(chars[b % (chars.Length - 1)]); }
        return result.ToString();
    }   
Community
  • 1
  • 1
JuhaKangas
  • 875
  • 5
  • 17
  • It is not a truly unique string, only random. That being said, it is a very strong alphanumeric random string that would be unlikely to reproduce itself in a small amount of occurrences, but the possibility does exist. It is impossible to determine if or when the string would be repeated, but if you had a billion strings generated, you would likely get a duplicate. – Howard Renollet Jun 03 '14 at 13:56
  • 1
    In crypto we generally assume that long enough random values are unique. Since this has only 47 bits I would not assume uniqueness. I'd only trust values of at least 120 bits. – CodesInChaos Jun 03 '14 at 14:45
  • 1
    This code is also pretty bad. It's biased (Number of characters does not divide 255) and it never outputs a `0` due to a off-by-one error. A better function for generating random strings is my answer to [How can I generate random alphanumeric strings in C#?](http://stackoverflow.com/a/13416143/445517), but obviously it still requires long enough outputs to achieve uniqueness. – CodesInChaos Jun 03 '14 at 14:47

4 Answers4

9

There is nothing in the code that guarantees that the result is unique. To get a unique value you either have to keep all previous values so that you can check for duplicates, or use a lot longer codes so that duplicates are practically impossible (e.g. a GUID). The code contains less than 48 bits of information, which is a lot less than the 128 bits of a GUID.

The string is just random, and although a crypto strength random generator is used, that is ruined by how the code is generated from the random data. There are some issues in the code:

  • A char array is created, that is just thrown away and replaced with another.
  • A one byte array of random data is created for no apparent reason at all, as it's not used for anything.
  • The GetNonZeroBytes method is used instead of the GetBytes method, which adds a skew to the distribution of characters as the code does nothing to handle the lack of zero values.
  • The modulo (%) operator is used to reduce the random number down to the number of characters used, but the random number can't be evenly divided into the number of characters, which also adds a skew to the distribution of characters.
  • chars.Length - 1 is used instead of chars.Length when the number is reduced, which means that only 61 of the predefined 62 characters can occur in the string.

Although those issues are minor, they are important when you are dealing with crypo strength randomness.

A version of the code that would produce a string without those issues, and give a code with enough information to be considered practically unique:

public static string GetUniqueKey() {
  int size = 16;
  byte[] data = new byte[size];
  RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();
  crypto.GetBytes(data);
  return BitConverter.ToString(data).Replace("-", String.Empty);
}
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • The downsite of using hex is that it only uses 16 instead of 62 characters, increasing the size of an encoded 128 bit value from 22 to 32 characters. – CodesInChaos Jun 03 '14 at 14:51
  • @CodesInChaos: Yes, but the upside is that it's easy to convert the random data into text without adding any skew. – Guffa Jun 03 '14 at 14:57
  • With practically unique I assume you mean it just has a very low chance of collision? Is the size of 16 instead of 8 key in that statement? – JuhaKangas Jun 03 '14 at 15:34
  • @JuhaKangas: Yes, with such large numbers you get about the same chance of getting a duplicate by random as the chance of a bit in the number changing spontaneously in the memory chip. Using 16 bytes instead 8 is part of that, using all 8 bits in each random byte instead of dividing it down to less than 6 bits is the other part. – Guffa Jun 03 '14 at 15:58
  • @JuhaKangas: See https://en.wikipedia.org/wiki/Birthday_attack#Mathematics for a table with exact collision probabilities. As the code uses 16bytes of randomness the row with 128bits applies to this case. – Perseids Jun 03 '14 at 19:14
  • Absolutely awesome. Simple, complete and well explained. – pim Apr 03 '18 at 21:25
4

Uniqueness and randomness are mutually exclusive concepts. If a random number generator is truly random, then it can return the same value. If values are truly unique, although they may not be deterministic, they certainly aren't truly random, because every value generated removes a value from the pool of allowed values. This means that every run affects the outcome of subsequent runs, and at a certain point the pool is exhausted (barring of course the possibility of an infinitely-sized pool of allowed values, but the only way to avoid collisions in such a pool would be the use of a deterministic method of choosing values).

The code you're showing generates values that are very random, but not 100% guaranteed to be unique. After enough runs, there will be a collision.

Nathan
  • 2,093
  • 3
  • 20
  • 33
1

I need to generate 7 characters of an alphanumeric string. With a small search, I wrote the below code. Performance results are uploaded above

I have used hashtable Class to guarantee uniqueness and also used RNGCryptoServiceProvider Class to get high-quality random chars

results of generating 100.000 - 1.000.000 - 10.000.000 sample

Generating unique strings; thanks to nipul parikh

    public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
        {
            uniqueHashTable = new Hashtable();
            nonUniqueList = new List<string>();
            uniqueList = new List<string>();

            for (int i = 0; i < count; i++)
            {
                isUniqueGenerated = false;

                while (!isUniqueGenerated)
                {
                    uniqueStr = GetUniqueKey();
                    try
                    {
                        uniqueHashTable.Add(uniqueStr, "");
                        isUniqueGenerated = true;
                    }
                    catch (Exception ex)
                    {
                        nonUniqueList.Add(uniqueStr);
                        // Non-unique generated
                    }
                }
            }

            uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();

            return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
        }

        public static string GetUniqueKey()
        {
            int size = 7;
            char[] chars = new char[62];
            string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
            chars = a.ToCharArray();

            RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();

            byte[] data = new byte[size];
            crypto.GetNonZeroBytes(data);

            StringBuilder result = new StringBuilder(size);

            foreach (byte b in data)
                result.Append(chars[b % (chars.Length - 1)]);

            return Convert.ToString(result);
        }

Whole Console Application Code below;

    class Program
    {
        static string uniqueStr;
        static Stopwatch stopwatch;
        static bool isUniqueGenerated;
        static Hashtable uniqueHashTable;
        static List<string> uniqueList;
        static List<string> nonUniqueList;
        static Tuple<List<string>, List<string>> generatedTuple;

        static void Main(string[] args)
        {
            int i = 0, y = 0, count = 100000;

            while (i < 10 && y < 4)
            {
                stopwatch = new Stopwatch();

                stopwatch.Start();

                generatedTuple = GenerateUniqueList(count);

                stopwatch.Stop();

                Console.WriteLine("Time elapsed: {0} --- {1} Unique  --- {2} nonUnique",
                    stopwatch.Elapsed,
                    generatedTuple.Item1.Count().ToFormattedInt(),
                    generatedTuple.Item2.Count().ToFormattedInt());

                i++;
                if (i == 9)
                {
                    Console.WriteLine(string.Empty);
                    y++;
                    count *= 10;
                    i = 0;
                }
            }


            Console.ReadLine();
        }

        public static Tuple<List<string>, List<string>> GenerateUniqueList(int count)
        {
            uniqueHashTable = new Hashtable();
            nonUniqueList = new List<string>();
            uniqueList = new List<string>();

            for (int i = 0; i < count; i++)
            {
                isUniqueGenerated = false;

                while (!isUniqueGenerated)
                {
                    uniqueStr = GetUniqueKey();
                    try
                    {
                        uniqueHashTable.Add(uniqueStr, "");
                        isUniqueGenerated = true;
                    }
                    catch (Exception ex)
                    {
                        nonUniqueList.Add(uniqueStr);
                        // Non-unique generated
                    }
                }
            }

            uniqueList = uniqueHashTable.Keys.Cast<string>().ToList();

            return new Tuple<List<string>, List<string>>(uniqueList, nonUniqueList);
        }

        public static string GetUniqueKey()
        {
            int size = 7;
            char[] chars = new char[62];
            string a = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
            chars = a.ToCharArray();

            RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider();

            byte[] data = new byte[size];
            crypto.GetNonZeroBytes(data);

            StringBuilder result = new StringBuilder(size);

            foreach (byte b in data)
                result.Append(chars[b % (chars.Length - 1)]);

            return Convert.ToString(result);
        }
    }

    public static class IntExtensions
    {
        public static string ToFormattedInt(this int value)
        {
            return string.Format(CultureInfo.InvariantCulture, "{0:0,0}", value);
        }
    }
0

Using strictly alphanumeric characters restricts the pool you draw from to 62. Using the complete printable character set(ASCII 32-126) increases your pool to 94, decreasing the likelihood of collision and eliminating creating the pool separately.

tinstaafl
  • 6,908
  • 2
  • 15
  • 22