0

I have below c# code which return

    public static void Main()
    {
        //var  bytes = "7A5EC53415E34F288269EBB05B73AFD4".ToByteArray(); 
        for(int i=0;i<1000;i++)
        {
            var a = Guid.NewGuid().ToString("N").ToUpper();
            Console.WriteLine($"{a}->{a.GetHashCode()}");
        }
    }

Which results

C5D9DB1ACBFB43E2B0DC2812F22C3899->          -1816084758
42B7C0BF6C0341DB96DE5084BA403193->          1197814565
B4062E3C129E478BA8E69552DDC700F2->          1349563395
863C9FBF0369496E90A1B0246F855D6E->          -772372816
B42562EDB97346F48DE37ADE3FB6620E->          2019192158

My Question here is, will the return values be a unique value so that I can use it for generating some random unique number? Sometime GetHashCode()return negative value and applying the Absolute will be considered a bad idea since i need without sign.

Coke
  • 965
  • 2
  • 9
  • 22
Biju
  • 45
  • 1
  • 9
  • 1
    Why not use `Random` for generating a random number? Anyway you might want to look up the "pigeonhole principle". Like all hashes, `GetHashCode` is not guaranteed unique. – ProgrammingLlama Apr 06 '20 at 02:41
  • if I use `Random` from multiple server what is the probability of generating the same series of numbers? I want to ensure the number is unique in this scenarios – Biju Apr 06 '20 at 02:49
  • Assuming you generate a value between `int.MinValue` and `int.MaxValue` then I would assume that it's just as good as your code above, if not better. I wouldn't trust a 32-bit integer to be guaranteed random between multiple servers though, or even between random numbers generated on a single server. What is your use case for the random number? – ProgrammingLlama Apr 06 '20 at 02:52
  • 2
    While `Guid.NewGuid()` should be unique, and is usually pretty good, if you are generating lots of them, you'll still get collisions. `string.GetHashCode()` is even less likely to be unique. Random numbers from a large enough address space, *should* be unique, but you will still have to cope with duplicates. – Jeremy Lakeman Apr 06 '20 at 02:53
  • 1
    Go read up on the birthday paradox, if you haven't already. – Jeremy Lakeman Apr 06 '20 at 02:54
  • See also https://stackoverflow.com/questions/5608980/how-to-ensure-a-timestamp-is-always-unique/14369695#14369695 – Ian Mercer Apr 06 '20 at 03:06
  • 1
    The `GUID` combinations are 2^122, while the `int` combinations are 2^32. So there's no way that all GUIDs will have different hashes. If you want only positive numbers, instead of making negative values positive (which will of course give a you half the combinations, a total of 2^31), use a `uint` instead. But if 2^32 combination is good for you, that's something you need to determine. – Andrew Apr 06 '20 at 03:16
  • 1
    I wonder if you have an XY problem? – Mitch Wheat Apr 06 '20 at 03:44
  • Why not to use UNIX timestamp as a unique value? – Coke Apr 06 '20 at 04:23
  • 1
    _"will the return values be a unique value"_ -- no. hash codes are, by definition, _never_ unique. And a 32-bit hash code is so susceptible to collisions, you should _never_ use them where a significant number of collisions would be a problem. Even longer hash codes can be forced to collide (see MD5), and even secure, long hash codes (e.g. SHA256) can in theory collide, even though in practice that's basically unheard of. See marked duplicate. – Peter Duniho Apr 06 '20 at 04:33
  • @john This is for generating a unique identifier which should be a 15 length numeric only value(requirement). Current solution we re using is Database sequencing (Oracle) but we are moving away from that.I am looking for a best solution in .net whic should work in a distributed environment but meet the uniqueness. A possible value can be 000123456789036 – Biju Apr 06 '20 at 13:44
  • 1
    I would suggest re-phrasing your question... Based on your comments, it seems to be something like: How can I guarantee that a generated value is unique when it's running across multiple processes? ... And then use your GUID attempt as an example of something you've tried but aren't sure if it will work. – Brian MacKay Apr 07 '20 at 13:33

1 Answers1

3

Sorry, no. It's harder than that, especially since you mentioned in the comments that this runs across multiple processes.

Ideas that won't work

GUIDs are somewhat unique across machines (in C# they use the MAC address and other strategies), but GetHashCode() is a big problem since that is in no way guaranteed to be unique.

You also can't rely on Random() for multiple reasons:

  • It uses Environment.Tick for its seed, and could indeed result in the same seed twice if called within a very narrow interval (low tens of milliseconds - 16ms?) from different processes.
  • Even if the seed was unique, some of the resulting values won't be. At least not at scale. We are looking for uniqueness, not just randomness.

One idea that will definitely work

If you want to be absolutely certain of uniqueness in a mechanical way across distributed processes running on different machines (or, worse, the same machine...), you could delegate the creation of unique values to a service. That service could save all unique values to a database table, forever. Or for some period of time if your requirements allow... That would be much easier.

Any time the service generates a new value, it would check the table first. If that value was ever used in the past, try again until a unique is found. Then save it to the database and return the value.

Doing this efficiently at scale would probably require a bit of engineering. There may be a more elegant purely mathematical solution, but this would be reliable and easy to understand (it does not rely on magic).

Brian MacKay
  • 31,133
  • 17
  • 86
  • 125
  • To be a bit picky :) The idea that will definitely work may not work if all values within the integer range have been assigned to a string. At the end of the day, there can only be 4,294,967,295 pairs in the table like how we run out of IPv4 addresses. – weichch Apr 06 '20 at 04:31
  • @weichch I suppose that is true... If billions of values is not enough, you could switch to a different data type. However, someone might still come along and point out that even that has theoretical limits. But let's assume from here that scale issues can all be solved with more engineering, and that people will try to stop somewhere around what's appropriate for their problem. :) Otherwise we might end up writing a book, not an SO answer. – Brian MacKay Apr 06 '20 at 15:38