0

I need to generate many unique identifier on a distributed system.

  1. In paranoia mode, I wan't to be sure to :
    • never have collision
    • prevent the determination of the computer's location used to generate the identifier (Mac adresse and date time)

I think generate UUID.

  1. If I use UUID1 (based on MAC address, timestamp, etc...) :

    • I'm sure to never have collision
    • It's possible to find the location
  2. If I use UUID4 (based on random generator) :

    • It's possible to have collision (The chance of a collision is really, really, really small, but exist !)
    • I'm sure it's not possible to dermine the location (date and computer)

Do you have a solution to satisfy both constraints ?

Indent
  • 4,675
  • 1
  • 19
  • 35
  • 3
    You might want to check [this answer](https://stackoverflow.com/a/10951301/15498) which links to some Eric Lippert postings. Have you actually looked at the odds of an accidental collision under Version 4? And placed that alongside other "highly unlikely" scenarios? – Damien_The_Unbeliever Oct 06 '17 at 13:41
  • 3
    Or [this answer](https://stackoverflow.com/a/2977648/15498): "... the odds of a GUID collision are smaller than the odds of a cosmic ray flipping a bit in your computer's memory and screwing up the answer given by any "accurate" algorithm you'd care to run" – Damien_The_Unbeliever Oct 06 '17 at 13:43
  • Hello, I agree with the theory but just for fun I would have liked to find a **paranoia** solution. – Indent Oct 06 '17 at 13:52
  • The smallest possible chance to generate a duplicate is to generate a completely random UUID in as large a range as possible. UUID4 does exactly that. Assuming you can trust your (P)RNG, that's as good as it gets. Using any sort of predetermined value in the UUID (e.g. MAC address) namespaces individual nodes (assuming you have fewer nodes than you're using bits within the UUID for the namespacing), but it reduces the number of bits used for the random id, halving the available space with each bit… – deceze Oct 06 '17 at 14:11
  • I agree with you. the probability of collision is extremely low with UUID4 ( I have a better chance of winning 10 or more consecutive times the lottery than having a collision between two UUID4). I am trying to have an absolutely safe solution. (this is not a development I want to use in real production, just a question I have been asking for a long time ) – Indent Oct 06 '17 at 14:24
  • 1
    In a nutshell: there's always a possibility of collision using any decentralised algorithm. Even with UUID1 you're depending on the uniqueness of the MAC address and the correctness of the time (not a given). UUID4 is already pretty much as good as it gets. Statistics and (un)likeliness are a scary and unintuitive thing. You're just more paranoid than you need to be. – deceze Oct 06 '17 at 14:31
  • 2
    I'm voting to close this question as off-topic because StackOverflow's scope is limited to *practical*, answerable questions. Rejecting accepted best-practices (and the statistics underlying them) makes this innately an impractical request. – Charles Duffy Oct 06 '17 at 14:36
  • Using UUID1 on a my own cluster of computer, I can be sure that the Mac adress are unique. If each computer execute a single UUID generator, I think (_and I hope !_) that *all the UUID are unique !* _(I don't understand why the process ID our processor Id is not used by UUID1 specification)_ – Indent Oct 06 '17 at 14:38

2 Answers2

0

May be we can combine UUID1 with a "static permutation" function determine by a "secret" key? The function must be bijective.

inspired by : http://code.activestate.com/recipes/126037-getting-nth-permutation-of-a-sequence/

def getPerm(seq, index):
    "Returns the <index>th permutation of <seq>"
    seqc= list(seq[:])
    seqn= [seqc.pop()]
    divider= 2 # divider is meant to be len(seqn)+1, just a bit faster
    while seqc:
        index, new_index= index//divider, index%divider
        seqn.insert(new_index, seqc.pop())
        divider+= 1
    return "".join(seqn)


secret=123456**123456 # a big secret integer

secure_id=getPerm("af5f2a30-aa9e-11e7-abc4-cec278b6b50a",secret)
# "1-a-faa4cba8c5b0ae372a72-ec-1fb9560e" ==> Same character, just change order

secure_id=getPerm("aaaaaaa0-bbbb-cccc-xxxx-yyyyyyyyyy0y",secret)
# "c-b-axaxxybyyyx0ybacyaya-cy-caybay0y" ==> Same secret => Same permutation  (check position of caracter '0' and '-')

We can improve the permutation "obfuscation" by preserving the position of '-' caracter

Indent
  • 4,675
  • 1
  • 19
  • 35
  • 1
    The odds of `secret` being leaked I would place far higher than the odds of an accidental Version 4 collision - and such a leak breaks the "computer location" requirement. – Damien_The_Unbeliever Oct 06 '17 at 14:02
  • Well seen for UUID4. I 'have change for a UUID1. For the odds of secret, may be I can generate it randomly and use hash of the secret in prefix of the ID. By this way nobody no the secret ! but, by this way may be I reintroduce collision probability... – Indent Oct 06 '17 at 14:08
0

May be we can be simply encode the UUID, using Base64 to easy transfert/store the ID. Encode is also a bijective function.

import base64

def encode(string,key):
    encoded_chars = []
    for i in xrange(len(string)):
        key_c = key[i % len(key)]
        encoded_c = chr((256 + ord(string[i]) + ord(key_c)) % 256)
        encoded_chars.append(encoded_c)
    encoded_string = "".join(encoded_chars)
    return base64.urlsafe_b64encode(encoded_string)

def decode(encoded_string,key):
    decoded_chars = []
    string=base64.urlsafe_b64decode(encoded_string)
    for i in xrange(len(string)):
        key_c = key[i % len(key)]
        decoded_c = chr((256 + ord(string[i]) - ord(key_c)) % 256)
        decoded_chars.append(decoded_c)
    encoded_string = "".join(decoded_chars)
    return "".join(decoded_chars)

secure_id=encode("af5f2a30-aa9e-11e7-abc4-cec278b6b50a","secret")
# "1MuY2JfVppWQ08at2JKUo8qroMbF1Zmh1srGpJys1ZvFp5XV"

uuid=decode(secure_id,"secret")
# "af5f2a30-aa9e-11e7-abc4-cec278b6b50a"

By this way, I'have always the secret probleme.

(note : we don't need decode function)

Indent
  • 4,675
  • 1
  • 19
  • 35