41

I have looked all of the place for this and I can't seem to get a complete answer for this. So if the answer does already exist on stackoverflow then I apologize in advance.

I want a unique and random ID so that users in my website can't guess the next number and just hop to someone else's information. I plan to stick to a incrementing ID for the primary key but to also store a random and unique ID (sort of a hash) for that row in the DB and put an index on it.

From my searching I realize that I would like to avoid collisions and I have read some mentions of SHA1.

My basic requirements are

  • Something smaller than a GUID. (Looks horrible in URL)
  • Must be unique
  • Avoid collisions
  • Not a long list of strange characters that are unreadable.

An example of what I am looking for would be www.somesite.com/page.aspx?id=AF78FEB

I am not sure whether I should be implementing this in the database (I am using SQL Server 2005) or in the code (I am using C# ASP.Net)

EDIT:

From all the reading I have done I realize that this is security through obscurity. I do intend having proper authorization and authentication for access to the pages. I will use .Net's Authentication and authorization framework. But once a legitimate user has logged in and is accessing a legimate (but dynamically created page) filled with links to items that belong to him. For example a link might be www.site.com/page.aspx?item_id=123. What is stopping him from clicking on that link, then altering the URL above to go www.site.com/page.aspx?item_id=456 which does NOT belong to him? I know some Java technologies like Struts (I stand to be corrected) store everything in the session and somehow work it out from that but I have no idea how this is done.

uriDium
  • 13,110
  • 20
  • 78
  • 138
  • A great URL-friendly encoding to turn such numeric values into shorter text values is base62, which is alphanumeric. Implementations are also quite rare, unfortunately. It is tricky to get right. Instead, you could look at base64-url, a URL-friendly variant of base64 that is more common than base62. – Timo May 28 '20 at 16:22

12 Answers12

22

Raymond Chen has a good article on why you shouldn't use "half a guid", and offers a suitable solution to generating your own "not quite guid but good enough" type value here:

GUIDs are globally unique, but substrings of GUIDs aren't

His strategy (without a specific implementiation) was based on:

  • Four bits to encode the computer number,
  • 56 bits for the timestamp, and
  • four bits as a uniquifier.

We can reduce the number of bits to make the computer unique since the number of computers in the cluster is bounded, and we can reduce the number of bits in the timestamp by assuming that the program won’t be in service 200 years from now.

You can get away with a four-bit uniquifier by assuming that the clock won’t drift more than an hour out of skew (say) and that the clock won’t reset more than sixteen times per hour.

Nuno André
  • 4,739
  • 1
  • 33
  • 46
Zhaph - Ben Duguid
  • 26,785
  • 5
  • 80
  • 117
  • 2
    Deleted my answer on the basis of this :) I still think that securing the pages properly and then using an incrementing ID is the best option. – Jon Skeet Feb 09 '09 at 20:14
  • Thanks, it's an excellent article on GUID anatomy! (and not because it made Jon Skeet remove his answer ;) – DK. Feb 09 '09 at 20:23
  • Actually, reading that link and given he's using the same algorithm from the same machine, he could easily cut that from 16 bytes to 10 and still have room left over (128 - 48 - 6 = 74). Raymond even suggests trimming another 10 'uniquifier' bits, getting it down to 8 bytes. – Joel Coehoorn Feb 09 '09 at 20:24
  • Why's there not a badge for this? ;) – Zhaph - Ben Duguid Feb 09 '09 at 20:38
  • Agreed that properly securing the page and then using incrementing id's would be the way to go - infact there are good perf. reasons not to use a GUID or GUID-like as an id in a database, especially for indexed columns – Zhaph - Ben Duguid Feb 09 '09 at 20:40
  • The latest version of the GUID (version 4) is basically a big random number. So you can just take a subset of it, you just have to understand what means. See http://stackoverflow.com/a/42428932/984780 – Luis Perez Feb 24 '17 at 01:00
  • The document that Raymond references covers both the time/Mac based option and the less unique version 4 option. – Zhaph - Ben Duguid Feb 24 '17 at 07:36
16

UPDATE (4 Feb 2017):
Walter Stabosz discovered a bug in the original code. Upon investigation there were further bugs discovered, however, extensive testing and reworking of the code by myself, the original author (CraigTP) has now fixed all of these issues. I've updated the code here with the correct working version, and you can also download a Visual Studio 2015 solution here which contains the "shortcode" generation code and a fairly comprehensive test suite to prove correctness.

One interesting mechanism I've used in the past is to internally just use an incrementing integer/long, but to "map" that integer to a alphanumeric "code".

Example

Console.WriteLine($"1371 as a shortcode is: {ShortCodes.LongToShortCode(1371)}");
Console.WriteLine($"12345 as a shortcode is: {ShortCodes.LongToShortCode(12345)}");
Console.WriteLine($"7422822196733609484 as a shortcode is: {ShortCodes.LongToShortCode(7422822196733609484)}");

Console.WriteLine($"abc as a long is: {ShortCodes.ShortCodeToLong("abc")}");
Console.WriteLine($"ir6 as a long is: {ShortCodes.ShortCodeToLong("ir6")}");
Console.WriteLine($"atnhb4evqqcyx as a long is: {ShortCodes.ShortCodeToLong("atnhb4evqqcyx")}");    

// PLh7lX5fsEKqLgMrI9zCIA   
Console.WriteLine(GuidToShortGuid( Guid.Parse("957bb83c-5f7e-42b0-aa2e-032b23dcc220") ) );      

Code

The following code shows a simple class that will change a long to a "code" (and back again!):

public static class ShortCodes
{
    // You may change the "shortcode_Keyspace" variable to contain as many or as few characters as you
    // please.  The more characters that are included in the "shortcode_Keyspace" constant, the shorter
    // the codes you can produce for a given long.
    private static string shortcodeKeyspace = "abcdefghijklmnopqrstuvwxyz0123456789";

    public static string LongToShortCode(long number)
    {
        // Guard clause.  If passed 0 as input
        // we always return empty string.
        if (number == 0)
        {
            return string.Empty;
        }

        var keyspaceLength = shortcodeKeyspace.Length;
        var shortcodeResult = "";
        var numberToEncode = number;
        var i = 0;
        do
        {
            i++;
            var characterValue = numberToEncode % keyspaceLength == 0 ? keyspaceLength : numberToEncode % keyspaceLength;
            var indexer = (int) characterValue - 1;
            shortcodeResult = shortcodeKeyspace[indexer] + shortcodeResult;
            numberToEncode = ((numberToEncode - characterValue) / keyspaceLength);
        }
        while (numberToEncode != 0);
        return shortcodeResult;
    }

    public static long ShortCodeToLong(string shortcode)
    {
        var keyspaceLength = shortcodeKeyspace.Length;
        long shortcodeResult = 0;
        var shortcodeLength = shortcode.Length;
        var codeToDecode = shortcode;
        foreach (var character in codeToDecode)
        {
            shortcodeLength--;
            var codeChar = character;
            var codeCharIndex = shortcodeKeyspace.IndexOf(codeChar);
            if (codeCharIndex < 0)
            {
                // The character is not part of the keyspace and so entire shortcode is invalid.
                return 0;
            }
            try
            {
                checked
                {
                    shortcodeResult += (codeCharIndex + 1) * (long) (Math.Pow(keyspaceLength, shortcodeLength));
                }
            }
            catch(OverflowException)
            {
                // We've overflowed the maximum size for a long (possibly the shortcode is invalid or too long).
                return 0;
            }
        }
        return shortcodeResult;
    }
}

}

This is essentially your own baseX numbering system (where the X is the number of unique characters in the shortCode_Keyspace constant.

To make things unpredicable, start your internal incrementing numbering at something other than 1 or 0 (i.e start at 184723) and also change the order of the characters in the shortCode_Keyspace constant (i.e. use the letters A-Z and the numbers 0-9, but scamble their order within the constant string. This will help make each code somewhat unpredictable.

If you're using this to "protect" anything, this is still security by obscurity, and if a given user can observe enough of these generated codes, they can predict the relevant code for a given long. The "security" (if you can call it that) of this is that the shortCode_Keyspace constant is scrambled, and remains secret.

EDIT: If you just want to generate a GUID, and transform it to something that is still unique, but contains a few less characters, this little function will do the trick:

public static string GuidToShortGuid(Guid gooid)
{
    string encoded = Convert.ToBase64String(gooid.ToByteArray());
    encoded = encoded.Replace("/", "_").Replace("+", "-");
    return encoded.Substring(0, 22);
}
CraigTP
  • 44,143
  • 8
  • 72
  • 99
  • @CraidTP I think your code has a bug in it. See the comments in the `Example` section that I added to your answer. – Walter Stabosz Feb 01 '17 at 15:56
  • 1
    @WalterStabosz You're right. In fact, upon further investigation there were a number of other bugs discovered in the original code. I've completely reworked the code to fix the bugs and updated the code here with the correct working version. – CraigTP Feb 04 '17 at 10:32
14

If you don't want other users to see people information why don't you secure the page which you are using the id?

If you do that then it won't matter if you use an incrementing Id.

David Basarab
  • 72,212
  • 42
  • 129
  • 156
  • The pages are secure but I will need a list of items that belong to that user to appear in the page. So I don't want them to try and see items which aren't theirs by tampering with the URL. – uriDium Feb 09 '09 at 20:11
  • If the page is secure, how could they see items which aren't theirs by tampering? – Jon Skeet Feb 09 '09 at 20:13
  • LongHorn is saying if it was secured properly it wouldn't matter if they guessed the URL. – DJ. Feb 09 '09 at 20:14
  • 2
    This is the right answer. Why do you (the questioner) care what people do if the site is secure? – paxdiablo Feb 09 '09 at 20:15
  • Let me try clarify, I am not talking about guessing URLs. Those pages will be protected and I will use .Nets Authentication and Authorization. I am talking about www.site.com/page.aspx?item=123 what is stoping him from changing the url to www.site.com/page.aspx?item=456 and item 456 is not his. – uriDium Feb 09 '09 at 20:21
  • You should check the userId of the user that is logged in. If there userId is 123 and they put item 456 123 != 456 reject them. – David Basarab Feb 09 '09 at 20:27
  • Is this a 3+ tier arch ? If so it seem to me that the "who can see what data" security should be enforced below UI layer. What if you want to open up the API to 3rd parties ? Or customer decide one day that they want an WPF app also on top of that ? – pero Feb 09 '09 at 21:11
  • "User can see only their items" have nothing to do with UI presentation. – pero Feb 09 '09 at 21:13
10

[In response to the edit]
You should consider query strings as "evil input". You need to programmatically check that the authenticated user is allowed to view the requested item.

if( !item456.BelongsTo(user123) )
{
  // Either show them one of their items or a show an error message.
}
Greg
  • 16,540
  • 9
  • 51
  • 97
3

You could randomly generate a number. Check that this number is not already in the DB and use it. If you want it to appear as a random string you could just convert it to hexadecimal, so you get A-F in there just like in your example.

webjunkie
  • 6,891
  • 7
  • 46
  • 43
3

A GUID is 128 bit. If you take these bits and don’t use a character set with just 16 characters to represent them (16=2^4 and 128/4 = 32 chacters) but a character set with, let’s say, 64 characters (like Base 64), you would end up at only 22 characters (64=2^6 and 128/6 = 21.333, so 22 characters).

Gumbo
  • 643,351
  • 109
  • 780
  • 844
2

Take your auto-increment ID, and HMAC-SHA1 it with a secret known only to you. This will generate a random-looking 160-bits that hide the real incremental ID. Then, take a prefix of a length that makes collisions sufficiently unlikely for your application---say 64-bits, which you can encode in 8 characters. Use this as your string.

HMAC will guarantee that no one can map from the bits shown back to the underlying number. By hashing an auto-increment ID, you can be pretty sure that it will be unique. So your risk for collisions comes from the likelihood of a 64-bit partial collision in SHA1. With this method, you can predetermine if you will have any collisions by pre-generating all the random strings that this method which generate (e.g. up to the number of rows you expect) and checking.

Of course, if you are willing to specify a unique condition on your database column, then simply generating a totally random number will work just as well. You just have to be careful about the source of randomness.

Emil Sit
  • 22,894
  • 7
  • 53
  • 75
0

A GUID is just a number

The latest generation of GUIDs (version 4) is basically a big random number*

Because it's a big random number the chances of a collision are REALLY small.

The biggest number you can make with a GUID is over:

5,000,000,000,000,000,000,000,000,000,000,000,000

So if you generate two GUIDs the chance the second GUID is the same as the first is:

1 in 5,000,000,000,000,000,000,000,000,000,000,000,000

If you generate 100 BILLION GUIDs.

The chance your 100 billionth GUID collides with the other 99,999,999,999 GUIDs is:

1 in 50,000,000,000,000,000,000,000,000

Why 128 bits?

One reason is that computers like working with multiples of 8 bits.

8, 16, 32, 64, 128, etc

The other reason is that the guy who came up with the GUID felt 64 wasn't enough, and 256 was way too much.

Do you need 128 bits?

No, how many bits you need depends on how many numbers you expect to generate and how sure you want to be that they don't collide.

64 bit example

Then the chance that your second number would collide with the first would be:

1 in 18,000,000,000,000,000,000 (64 bit)

Instead of:

1 in 5,000,000,000,000,000,000,000,000,000,000,000,000 (128 bit)

What about the 100 billionth number?

The chance your 100 billionth number collides with the other 99,999,999,999 would be:

1 in 180,000,000 (64 bit)

Instead of:

1 in 50,000,000,000,000,000,000,000,000 (128 bit)

So should you use 64 bits?

Depends are you generating 100 billion numbers? Even if you were then does 180,000,000 make you uncomfortable?

A little more details about GUIDs

I'm specifically talking about version 4.

Version 4 doesn't actually use all 128 bits for the random number portion, it uses 122 bits. The other 6 bits are used to indicate that is version 4 of the GUID standard.

The numbers in this answer are based on 122 bits.

And yes since it's just a random number you can just take the number of bits you want from it. (Just make sure you don't take any of the 6 versioning bits that never change - see above).

Instead of taking bits from the GUID though you could instead use the the same random number generator the GUID got it's bits from.

It probably used the random number generator that comes with the operating system.

Luis Perez
  • 27,650
  • 10
  • 79
  • 80
  • "If you generate 100 BILLION GUIDs. The chance your 100 billionth GUID collides with the other 99,999,999,999 GUIDs is 1 in 50,000,000,000,000,000,000,000,000" That doesn't sound right...isn't it more like (very approximately) 1 in 1,000,000,000,000,000? To have the chance of collision you quote, you'd only need around 500,000 guids. (https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions) – Stephen Drew Sep 08 '17 at 23:05
  • I see what you mean based on the formula in the wikipedia article. I can't identify where my logic is wrong though. Say I ask you to guess the call of a die, your chances are 1 in 6. If I let you guess 2 numbers before rolling the die your chances are 2 in 6 which can be reduced to 1 in 3. You can think of every guid you already have a guess in a game with a much larger die. In which case to get a 50% chance you would need 50% of all possible numbers which would be (2^122)/2. Which comes out to 2.6e36 while the article says you reach 50% chance of collision at 2.7e18. I guess I don't get it. – Luis Perez Sep 09 '17 at 00:51
  • To roll a die twice and not have a collision is a 5 in 6 chance. First, you roll the die. Then you roll again and have a 5/6 chance of not getting a collision. To roll the die three times and not have collision would be (5/6) * (4/6) = (20/36) and so on...ending up with a roughly 1.5% chance of being able to roll a die six times and get six unique numbers. – Stephen Drew Sep 09 '17 at 07:44
  • I think I get it now, thanks for taking the time to explain, I'll see about rewriting my answer, thanks! – Luis Perez Sep 09 '17 at 11:55
0

How long is too long? You could convert the GUID to Base 64, which ends up making it quite a bit shorter.

Kibbee
  • 65,369
  • 27
  • 142
  • 182
0

What you could do is something I do when I want exactly what you are wanting.

  1. Create your GUID.

  2. Get remove the dashes, and get a substring of how long you want your ID

  3. Check the db for that ID, if it exists goto step 1.

  4. Insert record.

This is the simplest way to insure it is obscured and unique.

Jeremy Boyd
  • 5,245
  • 7
  • 33
  • 57
0

I have just had an idea and I see Greg also pointed it out. I have the user stored in the session with a user ID. When I create my query I will join on the Users table with that User ID, if the result set is empty then we know he was hacking the URL and I can redirect to an error page.

uriDium
  • 13,110
  • 20
  • 78
  • 138
0

Late to the party but I found this to be the most reliable way to generate Base62 random strings in C#.

private static Random random = new Random();

void Main()
{
    var s = RandomString(7);
    Console.WriteLine(s);
}

public static string RandomString(int length)
{
    const string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    return new string(Enumerable.Repeat(chars, length)
        .Select(s => s[random.Next(s.Length)]).ToArray());
}
Soham Dasgupta
  • 5,061
  • 24
  • 79
  • 125