12

I try to use long as unique id within our C# application (not global, and only for one session) for our events. Do you know if the following will generate an unique long id?

public long GenerateId()
{
 byte[] buffer = Guid.NewGuid().ToByteArray();
 return BitConverter.ToInt64(buffer, 0);
}

Why we not use GUID directly? We think 8 bytes long is good enough.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
5YrsLaterDBA
  • 33,370
  • 43
  • 136
  • 210
  • No, this will only generate a random `Int64` value. Please define: `Unique`. For which range does need to be unique? – Bobby Apr 15 '11 at 14:26
  • If your question is "how do I generate a random long (Int64) in .NET," though I think it's not, there's a duplicate question full of good answers here: http://stackoverflow.com/questions/677373/generate-random-values-in-c – joshua.ewer Apr 15 '11 at 14:27
  • possible duplicate: http://stackoverflow.com/questions/2867758/how-to-generate-a-long-guid – sloth Apr 15 '11 at 14:29

11 Answers11

15

No, it won't. As highlighted many times on Raymond Chen's blog, the GUID is designed to be unique as a whole, if you cut out just a piece of it (e.g. taking only 64 bytes out of its 128) it will lose its (pseudo-)uniqueness guarantees.


Here it is:

A customer needed to generate an 8-byte unique value, and their initial idea was to generate a GUID and throw away the second half, keeping the first eight bytes. They wanted to know if this was a good idea.

No, it's not a good idea. (...) Once you see how it all works, it's clear that you can't just throw away part of the GUID since all the parts (well, except for the fixed parts) work together to establish the uniqueness. If you take any of the three parts away, the algorithm falls apart. In particular, keeping just the first eight bytes (64 bits) gives you the timestamp and four constant bits; in other words, all you have is a timestamp, not a GUID.

Since it's just a timestamp, you can have collisions. If two computers generate one of these "truncated GUIDs" at the same time, they will generate the same result. Or if the system clock goes backward in time due to a clock reset, you'll start regenerating GUIDs that you had generated the first time it was that time.


I try to use long as unique id within our C# application (not global, and only for one session.) for our events. do you know the following will generate an unique long id?

Why don't you just use a counter?

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • I agree with @Aliostad. A `UUID` or `GUID` does not garuantee uniqueness in any way...it's just *very* unlikely to be a duplicate. – Bobby Apr 15 '11 at 14:30
  • 5
    @Aliostad, @Bobby: correct in theory, irrelevant in practice. With the algorithm described in the article, you'd need two machines with the same MAC address generating the GUIDs at the same time (in theory down to the nanosecond) with the same clock sequence number. I'd say this is *very very* unlikely :) . GUIDs are engineered to be unique and to be treated as such, I'm quite sure that a lot of software would break if a duplicate GUID was generated. – Matteo Italia Apr 15 '11 at 14:39
  • `irrelevant in practice` same for a random Int64 bit number. Likelihood is 1 in 2^64... – Aliostad Apr 15 '11 at 14:46
  • @Matteo Italia: Yes, of course it is very unlikely, but I think we should always keep in mind that a `GUID` or `UUID` **does not guarantee** uniqueness. It is just *very very very ... very very* unlikely to be a duplicate. Treating it like it will always be absolutely unique without a doubt can lead to very bad things. And +1 for suggesting a counter and the added *pseudo*. – Bobby Apr 15 '11 at 14:52
  • 1
    @Aliostad: not for an Int64 generated cutting the first 8 bytes of a GUID. You would just get a timestamp, and if the clock resolution isn't really good you may get two identical int64 for subsequent events. – Matteo Italia Apr 15 '11 at 21:05
  • @Bobby: all your remarks are correct, thank you for the +1. :) – Matteo Italia Apr 15 '11 at 21:08
  • 1
    `Guid.NewGuid()` will provide a GUID 4, not a GUID 1. The provided link only considers type 1 GUIDs. Type 4 GUIDs don't have a timestamp but are (aside from the version field) completely random. – ckuri Jan 26 '21 at 08:54
  • @ckuri: yep, following several layers of delegations it looks like it's contractual it's a "random-style" GUID and not a "MAC-style" GUID. Nonetheless, I probably wouldn't even look at exactly what portion of it is random and what is the version field, and, if I needed to exploit its randomness, just hash it all and take the bytes I need (or, simpler, use a PRNG). – Matteo Italia Feb 08 '21 at 22:47
5

You cannot distill a 16-bit value down to an 8-bit value while still retaining the same degree of uniqueness. If uniqueness is critical, don't "roll your own" anything. Stick with GUIDs unless you really know what you're doing.

If a relatively naive implementation of uniqueness is sufficient then it's still better to generate your own IDs rather than derive them from GUIDs. The following code snippet is extracted from a "Locally Unique Identifier" class I find myself using fairly often. It makes it easy to define both the length and the range of characters output.

using System.Security.Cryptography;
using System.Text;

public class LUID
{
    private static readonly RNGCryptoServiceProvider RandomGenerator = new RNGCryptoServiceProvider();
    private static readonly char[] ValidCharacters = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789".ToCharArray();
    public const int DefaultLength = 6;
    private static int counter = 0;

    public static string Generate(int length = DefaultLength)
    {
        var randomData = new byte[length];
        RandomGenerator.GetNonZeroBytes(randomData);

        var result = new StringBuilder(DefaultLength);
        foreach (var value in randomData)
        {
            counter = (counter + value) % (ValidCharacters.Length - 1);
            result.Append(ValidCharacters[counter]);
        }
        return result.ToString();
    }
}

In this instance it excludes 1 (one), I (i), 0 (zero) and O (o) for the sake of unambiguous human-readable output.

To determine just how effectively 'unique' your particular combination of valid characters and ID length are, the math is simple enough but it's still nice to have a 'code proof' of sorts (Xunit):

    [Fact]
    public void Does_not_generate_collisions_within_reasonable_number_of_iterations()
    {
        var ids = new HashSet<string>();
        var minimumAcceptibleIterations = 10000;
        for (int i = 0; i < minimumAcceptibleIterations; i++)
        {
            var result = LUID.Generate();
            Assert.True(!ids.Contains(result), $"Collision on run {i} with ID '{result}'");
            ids.Add(result);
        }            
    }
nathanchere
  • 8,008
  • 15
  • 65
  • 86
2

No, it won't. A GUID has 128 bit length, a long only 64 bit, you are missing 64 bit of information, allowing for two GUIDs to generate the same long representation. While the chance is pretty slim, it is there.

Femaref
  • 60,705
  • 7
  • 138
  • 176
2

Per the Guid.NewGuid MSDN page,

The chance that the value of the new Guid will be all zeros or equal to any other Guid is very low.

So, your method may produce a unique ID, but it's not guaranteed.

Taylor Gerring
  • 1,825
  • 1
  • 12
  • 17
1
var s = Guid.NewGuid().ToString();
var h1 = s.Substring(0, s.Length / 2).GetHashCode(); // first half of Guid
var h2 = s.Substring(s.Length / 2).GetHashCode(); // second half of Guid
var result = (uint) h1 | (ulong) h2 << 32; // unique 8-byte long
var bytes = BitConverter.GetBytes(result);

P. S. It's very good, guys, that you are chatting with topic starter here. But what about answers that need other users, like me???

alexkovelsky
  • 3,880
  • 1
  • 27
  • 21
1

Yes, this will be most likely unique but since the number of bits are less than GUID, the chance of duplicate is more than a GUID - although still negligible.

Anyway, GUID itself does not guarantee uniqueness.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • For all practical intents and purposes, a GUID is unique. Mathematically speaking, you are correct, but the chances of a collision are so astronomically low that it's a moot point. – EdwardG Feb 17 '16 at 16:51
0

enerates an 8-byte Ascii85 identifier based on the current timestamp in seconds. Guaranteed unique for each second. 85% chance of no collisions for 5 generated Ids within the same second.

private static readonly Random Random = new Random();
public static string GenerateIdentifier()
{
    var seconds = (int) DateTime.Now.Subtract(new DateTime(1970, 1, 1, 0, 0, 0)).TotalSeconds;
    var timeBytes = BitConverter.GetBytes(seconds);
    var randomBytes = new byte[2];
    Random.NextBytes(randomBytes);
    var bytes = new byte[timeBytes.Length + randomBytes.Length];
    System.Buffer.BlockCopy(timeBytes, 0, bytes, 0, timeBytes.Length);
    System.Buffer.BlockCopy(randomBytes, 0, bytes, timeBytes.Length, randomBytes.Length);
    return Ascii85.Encode(bytes);
}
odyth
  • 4,324
  • 3
  • 37
  • 45
0

As already said in most of the other answers: No, you can not just take a part of a GUID without losing the uniqueness.

If you need something that's shorter and still unique, read this blog post by Jeff Atwood:
Equipping our ASCII Armor

He shows multiple ways how to shorten a GUID without losing information. The shortest is 20 bytes (with ASCII85 encoding).

Yes, this is much longer than the 8 bytes you wanted, but it's a "real" unique GUID...while all attempts to cram something into 8 bytes most likely won't be truly unique.

Christian Specht
  • 35,843
  • 15
  • 128
  • 182
0

Like a few others have said, only taking part of the guid is a good way to ruin its uniqueness. Try something like this:

var bytes = new byte[8];
using (var rng = new RNGCryptoServiceProvider())
{
    rng.GetBytes(bytes);
}

Console.WriteLine(BitConverter.ToInt64(bytes, 0));
Mike Goatly
  • 7,380
  • 2
  • 32
  • 33
0

In most cases bitwise XOR of both halves together is enough

  • 1
    Please expand upon your answer with an explanation. – DougM Mar 31 '20 at 03:36
  • Any nonempty subset of random bit sequence is a random bit sequence. Sum of two random sequences is a random sequence. Sum of random sequence and constant is a random sequence. XOR is a bitwise sum. Xoring together constant part of GUID with noncostant part will give us result with maximum possible enthropy. – Dimo Stoianov Apr 03 '20 at 00:04
0

Everyone in here is making this way more complicated than it needs to be. This is a terrible idea.

GUID 1: AAAA-BBBB-CCCC-DDDD
GUID 2: AAAA-BBBB-EEEE-FFFF

throw away the second half of each GUID, and now you have a duplicate identifier. GUIDs are not guaranteed to be unique, and its extremely awful. you shouldn't rely on the gurantee of whats generated, and it's not hard to get around this. If you need unique identifiers for an object, entity, or whatever, lets take a database for example - which is the most common, you should generate an id, see if it already exists, and insert it only if it doesn't. this is fast in databases since most tables are indexed based on ID. "most." if you have some kind of small object list in memory, or wherever, you'd probably store the entity in a hash table of some kind, in which you could just look it up to see if that generated GUID already exists.

all in all, depends on what your use case is really. a database, find the GUID first, and regenerate if possible until you can insert the new item. this really only matters in relational databases who dont automatically generate IDs for items in the tables. NoSQL DB's usually generate a unique identifier.