13

After reading this blog post https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/

I wanted to know more about how Django generates uuid because I am using them as my pk. Well, according to the docs, https://docs.djangoproject.com/es/1.9/ref/models/fields/#uuidfield, Django is relying on the Python UUID module https://docs.python.org/3/library/uuid.html#uuid.UUID. But there are many kinds of UUID, and it is not at all clear to me which one is being generated in Django, or how to chose, assuming a choice is available.

Finally, given the fragmentation issue pointed out in the blog post, and assuming uuid_generate_v1mc is not available directly in Python or Django, is there a way to force them to use it?

bakkal
  • 54,350
  • 12
  • 131
  • 107
Malik A. Rumi
  • 1,855
  • 4
  • 25
  • 36

2 Answers2

28
  • How does Django and or Python generate a UUID in Postgresql?

  • But there are many kinds of UUID, and it is not at all clear to me which one is being generated in Django

When you use UUIDField as a primary key in Django, it doesn't generate a UUID one for you, you generate it yourself before you save the object

I don't know if things have changed since, but last time I have used a UUIDField, you had to specify the UUID value yourself (e.g. when you create the object, Django won't let you save an object with a blank UUID and have the database generate one). Looking at the Django documentation samples reinforces my thought, because they provide a default=uuid.uuid4() e.g. in the primary key.

class MyUUIDModel(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
                                                    ^
                                                    |__ calls uuid.uuid4() 

Which UUID version to choose

For a comparison of the properties of the different UUID versions please see this question: Which UUID version to use?

For a lot of applications, UUID4 is just fine

If you just want to generate a UUID and get on with your life, uuid.uuid4() like the snippet above is just fine. UUID4 is a random UUID and the chances of a collision are so remote that you don't really need to worry about, especially if you're not generating a ton of them per second.

Finally, given the fragmentation issue pointed out in the blog post, and assuming uuid_generate_v1mc is not available directly in Python or Django, is there a way to force them to use it?

A Python UUID1 with random MAC address, like uuid-ossp's uuid_generate_v1mc

The blog you linked mentions the use of UUID1. Python's uuid.uuid1() takes a parameter that is used instead of the default real hardware MAC address (48 bits). Because these random bits are the end of the UUID1, the first bits of the UUID1 can be sequential/timestamp-based to limit the index fragmentation.

So

uuid.uuid1(random_48_bits)

Should get you similar results as uuid_generate_v1mc, which is a UUID1 with a random MAC address.

To generate a random 48 bits, as a dummy example we can use:

import random
random_48_bits = random.randint(0, 2**48 - 1)

Try it:

>>> import uuid
>>> import random
>>> 2 ** 48 - 1
281474976710655
>>> uuid.uuid1(random.randint(0, 281474976710655))
UUID('c5ecbde1-cbf4-11e5-a759-6096cb89d9a5')

Now make a function out of it, and use it as the default for your Django UUIDField

Custom UUIDs, and an example from Instagram

Note that it's totally fine to come up with your custom UUID scheme, and use the available bits to encode information that can be useful to your application.

E.g. you may use a few bits to encode the country of a given user, a few bits with a timestamp, some bits for randomness etc.

You may want to read how Instagram (built on Django and PostgreSQL) cooked up their own UUID scheme to help with sharding.

gek
  • 524
  • 6
  • 18
bakkal
  • 54,350
  • 12
  • 131
  • 107
  • 1
    RE: Twitter Snowflake: "We have retired the initial release of Snowflake and working on open sourcing the next version based on Twitter-server, in a form that can run anywhere without requiring Twitter's own infrastructure services." https://github.com/twitter/snowflake – Malik A. Rumi Feb 04 '16 at 20:43
  • What about the fragmentation issue? – Malik A. Rumi Feb 04 '16 at 20:45
  • @MalikA.Rumi Added a snippet how to generate a UUID1 with random 48 bits MAC address, just like the `uuid_generate_v1mc` suggest by the blog to limit the index fragmentation. – bakkal Feb 05 '16 at 10:42
  • thanks, a very thorough answer. I look forward to whatever is coming with Twitter Snowflake, and I will try your approach to making a custom uuid. Also, note that today I came across this SO answer about performance: http://stackoverflow.com/questions/29880083/postgresql-uuid-type-performance. – Malik A. Rumi Feb 05 '16 at 18:55
  • 1
    What if I am generating a ton of uuid's per second? For example, I'm adding a UUIDField to a model containing 93,000 records. When I run the migration, I keep getting an IntegrityError - the uuid's are colliding. Is there any way to guarantee the generating uuid is unique? – nnyby Mar 04 '16 at 15:44
0
from django.db.models import Func, UUIDField

class RandomUUID(Func):
    template = "uuid_in(md5(random()::text || clock_timestamp()::text)::cstring)"
    output_field = UUIDField()



def add_guid(apps, schema_editor):
    MyModel= apps.get_model("app", "MyModel")
    MyModel.objects.update(guid=lib_models.RandomUUID())


class Migration(migrations.Migration):
    ...
    operations = [
        migrations.RunPython(add_guid, reverse_code=migrations.RunPython.noop),
    ]
Ivan
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 12 '22 at 06:26