-1

I Would like to create a "collision-free" unique id in go language, for highly-scalable application.

Wikipedia recommends a namespace variation of the UUID (That I could only assume referes to version 3 or 5) Wikipedia Specifically states:

Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.

I'm having a few difficulties with this

  1. Version 3 and 5 requires to hash the data which seems like an unnecessary thing to do for the falowing reasons:

    1.1. I the application might use the same data (that I would like a different id for)

    1.2. I assume that in data-leakage terms internal random() entropy is considered secured I don't understand why cryptograpic hashing is needed (as hashing I guess takes WAY more resources then some seed calculations).

  2. Namespace should be a value that would protect against collisions that might arise in a high-concurrency environment. In GO, goroutine can run in parallel, and might use the same seed due to high server performance (as wikipedia mentioned). I Assume that the best value for the namespace is the id of the goroutine thus collisions could be avoided on the same machine. I Cannot find any proper way to retrive a uniqe id for the current goroutine execution.

  3. If in fact wikipedia revers to version 4 (random) with a namespace component, How do I generate such guid? the docs do not show this kind of option

TL;DR: How do I properly securely and scalably generate unique ids in GOLang ?

HLL
  • 169
  • 10
  • If you vote down please comment... – HLL Dec 22 '14 at 00:01
  • Just for the LOLs, This library (https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/util/uuid.go) actually perform Sleep in order to overcome this possible issue – HLL Dec 23 '14 at 19:41

1 Answers1

2

The doc states that: func NewRandom() - returns a Random (Version 4) UUID or panics. The strength of the UUIDs is based on the strength of the crypto/rand package.

This means that this package is using the crypto/rand cryptographic strength random to generate type 4 uuids. Personally, provided there aren't bugs in the implementation, I'd trust that unless I'm generating billions of ids daily.

Another option is to use Version 5: func NewSHA1(space UUID, data []byte) UUID, and feed it a Vesion 1 UUID as the namespace, and data from crypto/random as the "data". i.e something like this:

// this is a static namespace for this machine, say
var namespace = uuid.NewUUID()


// generate a random UUID with the global namespace
func NewNamespacedRandom() (uuid.UUID, error) {

    // read 16 crypto-random bytes
    rnd := make([]byte, 16)
    if _, err := rand.Read(rnd); err != nil {
        return nil, err
    }

    return uuid.NewSHA1(namespace, rnd), nil
}

func main() {

    u, err := NewNamespacedRandom()
    if err != nil {
        panic(err)
    }
    fmt.Println(u)

}
Not_a_Golfer
  • 47,012
  • 14
  • 126
  • 92
  • Thanks, but it still seems an ugly and not very satisfying way to do this... The potential for this system is for millions of entries a day, but still, why do I need to relay on statistics where there is a sound way to do that (disregarding statistics or at least in most part). If I were to design the package i'd give goroutines a automated unique id(namespace) combined with the machine mac and a random value based off the standard seed. This would have given me a complete peace of mind... – HLL Dec 22 '14 at 00:26
  • 1
    @HLL ugly is a matter of taste. As I said personally I wouldn't bother beyond simple version 4 uuids (as I have done in the past), but if I would have - this approach is anything but ugly, and the random reading is exactly the way the version 4's are generated inside the uuid library. – Not_a_Golfer Dec 22 '14 at 09:29
  • 1
    @HLL From http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates: "after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%". Even if these numbers are orders of magnitude off, I think you'll be ok. – Intermernet Dec 22 '14 at 13:23
  • I'll go with random; but Still I Don't understand - The question is different, then: "What is the rational behind the lack of ability to generate uuid based of goroutine id+machine mac+time seed that theoretically should provide a collision-free uuid under all circumstances"... – HLL Dec 23 '14 at 13:27
  • @Intermernet, They are referring to a ver 4 uuid with different seed values; for high-performance system they have the last paragraph saying what i'v quoted (That's also makes sense, "What if there were 2 calls of NewRandom in dT that is very very very tiny assuming the seed is the computer clock")... – HLL Dec 23 '14 at 13:32
  • @HLL going back to my example - when you launch a new goroutine you can create a time based UUID 1, and use it as the namespace for all UUIDs this goroutine will generate using the random version. Won't that satisfy your needs? – Not_a_Golfer Dec 23 '14 at 13:43
  • @Not_a_Golfer As far as I can understand, the uuid.NewUUID (in the so called namespace) is prone to the same collision opportunities as every uuid. Namespace should be guaranteed uniqueness, in threaded environment i would have acquired the thread id and use that as a namespace. BTW it's not much for as satisfying my needs as the question is logical/conseptual in essence. – HLL Dec 23 '14 at 19:37
  • 1
    @hll they are based on machine id and time at the least, so if you have one goroutine starting other goroutines and generating the "namespaces" sequentially (so the times never collide on the same machine), they cannot collide between machines and between goroutines the way I see it. – Not_a_Golfer Dec 23 '14 at 21:22