15

I would like to generate a long UUID - something like the session key used by gmail. It should be at least 256 chars and no more than 512. It can contain all alpha-numeric chars and a few special chars (the ones below the function keys on the keyboard). Has this been done already or is there a sample out there?

C++ or C#

Update: A GUID is not enough. We already have been seeing collisions and need to remedy this. 512 is the max as of now because it will prevent us from changing stuff that was already shipped.

Update 2: For the guys who are insisting about how unique the GUID is, if someone wants to guess your next session ID, they don't have to compute the combinations for the next 1 trillion years. All they have to do is use constrain the time factor and they will be done in hours.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Ron
  • 171
  • 1
  • 1
  • 5
  • 10
    Is globally unique not unique enough? – John Rasch May 19 '10 at 17:10
  • Is there a reason it must be specifically between 256 and 512 chars? if not and you just pulled those numbers because they seem "long enough" `System.Guid.NewGuid().ToString("N");` may be good enough. – Scott Chamberlain May 19 '10 at 17:11
  • 7
    @John: What, your application doesn't need to support an interstellar empire? – Steven Sudit May 19 '10 at 17:11
  • 16
    Regardless of whether what he's asking is unnecessary, it seems pointless to downvote the guy for an honest question... – Dan Tao May 19 '10 at 17:12
  • 4
    Why don't you just answer his question without all the sarcasm and bad vibes? – bl4ckb0l7 May 19 '10 at 17:13
  • 1
    @bitschnau i disagree. it is possible that asker is not aware that Guid is unique enought. if it is just that he really don't need LONG guid. if he needs only some sequence of bytes then again, it is not guid that he needs – Andrey May 19 '10 at 17:16
  • @Ron, Guid IS unique enought if you generate it well. – Andrey May 19 '10 at 17:18
  • @bitschnau - Well something is terribly wrong with MS implementation because after a few billions we have had collisions and NEED to fix it. – Ron May 19 '10 at 17:18
  • @Kman - allow folks from someplace to post messages to some forums w/o the security services guessing their next session ID. – Ron May 19 '10 at 17:19
  • 1
    @Ron http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique "This will run for a lot more than hours. Assuming it loops at 1Ghz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe." – Andrey May 19 '10 at 17:21
  • I am not sure why all the bad vibes. To see what i am talking about just do a simple test. Create new GUIDs and insert into a hash table. See how long it takes before you get collisions. In our case it took less than two days usign the .net system guid class. – Ron May 19 '10 at 17:25
  • 1
    reading your update I would say there is a bigger issue here, you should not be seing collisions with a GUID, I would suggest that you look into how your GUIDs are generated, otherwise you may see the same issue with a custom rolled UUID – Pharabus May 19 '10 at 17:26
  • @Ron - may be you have a bug and try to reuse the same id? btw, i posted a solution, did you try it? – Andrey May 19 '10 at 17:28
  • Woah. I never thought about collisions, but if you look at Microsoft's implementation: http://msdn.microsoft.com/en-us/library/system.guid.aspx, it says: "A GUID is a 128-bit integer (16 bytes) that can be used across all computers and networks wherever a unique identifier is required. Such an identifier has a very low probability of being duplicated." *low probablility* ... color me shocked. – Ogre Psalm33 May 19 '10 at 17:35
  • @Ogre: But consider that the GUID is required to be random. How could they *guarantee* uniqueness, without storing all GUIDs ever created? It's not so surprising when you consider the alternative. – Dan Tao May 19 '10 at 17:43
  • 1
    @Dan because a portion of the GUID is a timestamp, as long as you don't turn back the clock they will be diffrent. Another portion is a machine id so two machines with the same timestamp will return different numbers. – Scott Chamberlain May 19 '10 at 17:49
  • @Scott: Random GUIDs (e.g., `Guid.NewGuid`) do not have timestamps or machine id's. – Stephen Cleary May 19 '10 at 17:50
  • @Ron: "constraining the time factor" won't make a whit of difference because `Guid.NewGuid` does not contain a timestamp. – Stephen Cleary May 19 '10 at 17:56
  • did this just turn from a 'I have collisions' into a 'I dont want people to hijack a session' question? – Pharabus May 19 '10 at 17:58
  • Yes. I want unique session id's – Ron May 19 '10 at 18:00
  • @Ron, I thought I would give you the benefit of the doubt here and I tried your simple test. As I expected within a few minutes and 47995854 guids later I ran out of memory and no collisions. – Chris Taylor May 19 '10 at 18:35
  • 1
    @Ron: With all due respect, you're trying to solve a problem that doesn't exist. See my answer for more details. – Steven Sudit May 19 '10 at 19:09
  • @Scott Chamberlain: I think you're mistaken. There's neither timestamp nor machine ID in Microsoft's current GUID implementation. – Dan Tao May 19 '10 at 23:06
  • 1
    Some detail on what you're *doing* with your GUID's would be nice. Because unless you're using your own made-up algorithm to generate them, they *are* unique. Do you hash them or something (which could introduce hash collisions)? How are you generating them? Where are you storing them? Where are you comparing them? You're doing *something* wrong, and the answer is not "use more bits", but instead to correct the error – jalf May 20 '10 at 03:06
  • @Andrey All of your thoughts are assumptions in my eyes. I found the question pretty straight forward and if you have the feeling the asker might be asking for a wrong reason or is taking some wrong assumptions himself, than I think you should ask for clarification. But not downvote and make jokes of the asker. That's not the idea of stackoverflow.com in my opinnion. @Ron That's exactly what I am talking about. Maybe funny, but pointless humor! :-) – bl4ckb0l7 May 20 '10 at 10:12
  • @bitschnau are you blaming the right person? i send the first and working solution, and didn't make any jokes. – Andrey May 20 '10 at 11:20
  • there is a very little chance that GUID's duplicate but if you want to eliminate that risk you can concat two or three GUID's together so that the chance would be minimize to a 0.000000001 extent – Atif Imtiaz Jan 23 '13 at 10:13

9 Answers9

31

If your GUIDs are colliding, may I ask how you're generating them?

It is astronomically improbable that GUIDs would collide as they are based on:

  • 60 bits - timestamp during generation
  • 48 bits - computer identifier
  • 14 bits - unique ID
  • 6 bits are fixed

You would have to run the GUID generation on the same machine about 50 times in the exact same instant in time in order to have a 50% chance of collision. Note that instant is measured down to nanoseconds.

Update:

As per your comment "putting GUIDs into a hashtable"... the GetHashCode() method is what is causing the collision, not the GUIDs:

public override int GetHashCode()
{
    return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}

You can see it returns an int, so if you have more than 2^32 "GUIDs" in the hashtable, you are 100% going to have a collision.

Community
  • 1
  • 1
John Rasch
  • 62,489
  • 19
  • 106
  • 139
  • Try inserting in sql-server 2005. That's were we saw the collisions – Ron May 19 '10 at 17:37
  • 3
    @Ron: you could always have SQL give you the GUID. – Matthew Whited May 19 '10 at 17:39
  • 1
    This is a good point. The OP should modify his test program to check for equality when he gets a collision. That way you can tell the difference between 2 GUIDs with colliding hash codes and 2 identical GUIDs. – A. Levy May 19 '10 at 17:42
16

As per your update2 you are correct on Guids are predicable even the msdn references that. here is a method that uses a crptographicly strong random number generator to create the ID.

static long counter; //store and load the counter from persistent storage every time the program loads or closes.

public static string CreateRandomString(int length)
{
    long count = System.Threading.Interlocked.Increment(ref counter);
    int PasswordLength = length;
    String _allowedChars = "abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNOPQRSTUVWXYZ23456789";
    Byte[] randomBytes = new Byte[PasswordLength];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(randomBytes);
    char[] chars = new char[PasswordLength];
    int allowedCharCount = _allowedChars.Length;
    for (int i = 0; i < PasswordLength; i++)
    {
        while(randomBytes[i] > byte.MaxValue - (byte.MaxValue % allowedCharCount))
        {
            byte[] tmp = new byte[1];
            rng.GetBytes(tmp);
            randomBytes[i] = tmp[0];
        }
        chars[i] = _allowedChars[(int)randomBytes[i] % allowedCharCount];
    }
    byte[] buf = new byte[8];
    buf[0] = (byte) count;
    buf[1] = (byte) (count >> 8);
    buf[2] = (byte) (count >> 16);
    buf[3] = (byte) (count >> 24);
    buf[4] = (byte) (count >> 32);
    buf[5] = (byte) (count >> 40);
    buf[6] = (byte) (count >> 48);
    buf[7] = (byte) (count >> 56);
    return Convert.ToBase64String(buf) + new string(chars);
}

EDIT I know there is some biasing because allowedCharCount is not evenly divisible by 255, you can get rid of the bias throwing away and getting a new random number if it lands in the no-mans-land of the remainder.

EDIT2 - This is not guaranteed to be unique, you could hold a static 64 bit(or higher if necessary) monotonic counter encode it to base46 and have that be the first 4-5 characters of the id.

UPDATE - Now guaranteed to be unique

UPDATE 2: Algorithm is now slower but removed biasing.

EDIT: I just ran a test, I wanted to let you know that ToBase64String can return non alphnumeric charaters (like 1 encodes to "AQAAAAAAAAA=") just so you are aware.

New Version:

Taking from Matt Dotson's answer on this page, if you are no so worried about the keyspace you can do it this way and it will run a LOT faster.

public static string CreateRandomString(int length)
{
    length -= 12; //12 digits are the counter
    if (length <= 0)
        throw new ArgumentOutOfRangeException("length");
    long count = System.Threading.Interlocked.Increment(ref counter);
    Byte[] randomBytes = new Byte[length * 3 / 4];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(randomBytes);

    byte[] buf = new byte[8];
    buf[0] = (byte)count;
    buf[1] = (byte)(count >> 8);
    buf[2] = (byte)(count >> 16);
    buf[3] = (byte)(count >> 24);
    buf[4] = (byte)(count >> 32);
    buf[5] = (byte)(count >> 40);
    buf[6] = (byte)(count >> 48);
    buf[7] = (byte)(count >> 56);
    return Convert.ToBase64String(buf) + Convert.ToBase64String(randomBytes);
}
Community
  • 1
  • 1
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • 1
    Technically you're still not guaranteed to be unique. When you run out of counter bits and start cycling, you'll get the very remote possibility of collision. If you're going to keep a counter, the you can just use System.Numerics.BigInteger and count forever. – Matt Dotson May 19 '10 at 20:09
  • 4
    if he generates 11,574 new id's per second (1 billion per day) 24 hours a day, 356 days a year, it will take 50.5 million years for it to wrap around. I think 64 bit will be fine. – Scott Chamberlain May 19 '10 at 21:11
  • Would using your method of generating a random string, say, be safe for generating unique product keys? – David Anderson Apr 18 '12 at 08:21
  • @DavidAnderson Yes, but if the product key is going to be verified on the client without connecting to a server to check if it is valid it would not work. For that kind of method you need some way to verify that the key is "Correct". See [this SO Question](http://stackoverflow.com/questions/599837/how-to-generate-and-validate-a-software-license-key) for instructions to do it that way. – Scott Chamberlain Apr 18 '12 at 13:58
  • I already have an activation server wrote and ready, I just wanted to make sure the algorithm was good enough for key generation (I tested it generating 1,000,000 unique keys and there were no collisions), so it seems like a good method. :) I was struggling with Base32Encoding and things of such. – David Anderson Apr 18 '12 at 19:07
  • Check out RubbishSoft LongGuid: http://rubbishsoft.com/libraries/longguid (Yes, it's a joke). – Giorgi Chakhidze Jul 18 '13 at 17:31
10
StringBuilder sb = new StringBuilder();
for (int i = 0; i < HOW_MUCH_YOU_WANT / 32; i++)
   sb.Append(Guid.NewGuid().ToString("N"));
return sb.ToString();

but what for?

Andrey
  • 59,039
  • 12
  • 119
  • 163
8

The problem here is why, not how. A session ID bigger than a GUID is useless, because it's already big enough to thwart brute force attacks.

If you're concerned about predicting GUID's, don't be. Unlike the earlier, sequential GUID's, V4 GUID's are cryptographically secure, based on RC4. The only exploit I know about depends on having full access to the internal state of the process that's generating the values, so it can't get you anywhere if all you have is a partial sequence of GUID's.

If you're paranoid, generate a GUID, hash it with something like SHA-1, and use that value. However, this is a waste of time. If you're concerned about session hijacking, you should be looking at SSL, not this.

Steven Sudit
  • 19,391
  • 1
  • 51
  • 53
3
byte[] random = new Byte[384];

//RNGCryptoServiceProvider is an implementation of a random number generator.
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(random);
var sessionId = Convert.ToBase64String(random);

You can replace the "/" and "=" from the base64 encoding to be whatever special characters are acceptable to you.

Base64 encoding creates a string that is 4/3 larger than the byte array (hence the 384 bytes should give you 512 characters).

This should give you orders of magnatude more values than a base16 (hex) encoded guid. 512^16 vs 512^64

Also if you are putting these in sql server, make sure to turn OFF case insensitivity.

Matt Dotson
  • 5,887
  • 1
  • 23
  • 23
1

There are two really easy ways (C#):

1) Generate a bunch of Guids using Guid.NewGuid().ToString("N"). each GUID will be 32 characters long, so just generate 8 of them and concatenate them to get 256 chars.

2) Create a constant string (const string sChars = "abcdef") of acceptable characters you'd like in your UID. Then in a loop, randomly pick characters from that string by randomly generating a number from 0 to the length of the string of acceptable characters (sChars), and concatenate them in a new string (use stringbuilder to make it more performant, but string will work too).

Jeremy
  • 44,950
  • 68
  • 206
  • 332
1

You may want to check out boost's Uuid Library. It supports a variety of generators, including a random generator that might suit your needs.

Alan
  • 13,510
  • 9
  • 44
  • 50
0

https://github.com/bigfatsea/SUID Simple Unique Identifier

Though it's in Java, but can be easily ported to any other language. You may expect duplicated ids on same instance 136 years later, good enough for medium-small projects.

Example:

long id = SUID.id().get();
BigFatSea
  • 473
  • 1
  • 6
  • 13
0

I would use some kind of hash of std::time() probably sha512. ex (using crypto++ for the sha hash + base64 encoding).

#include <iostream>
#include <sstream>
#include <ctime>
#include <crypto++/sha.h>
#include <crypto++/base64.h>

int main() {
    std::string digest;
    std::stringstream ss("");
    ss << std::time(NULL);

    // borrowed from http://www.cryptopp.com/fom-serve/cache/50.html
    CryptoPP::SHA512 hash;
    CryptoPP::StringSource foo(ss.str(), true,
        new CryptoPP::HashFilter(hash,
           new CryptoPP::Base64Encoder(
               new CryptoPP::StringSink(digest))));
    std::cout << digest << std::endl;

    return 0;
}