Generating a non-sequential ID/PK for a Django Model

Question

I'm on the cusp of starting work on a new webapp. Part of this will give users pages that they can customise in a one to many relationship. These pages naturally need to have unique URLs.

Left to its own devices, Django would normally assign a standard AUTOINCREMENT ID to a model. While this works fantastically, it doesn't look great and it also makes pages very predictable (something that isn't desired in this case).

Rather than 1, 2, 3, 4 I would like set-length, randomly generated alphanumeric strings (eg h2esj4). 6 spots of a possible set of 36 characters should give me over two billion combinations which should be more than enough at this stage. Of course if I could expand this at a later time, that would be good too.

But there are two issues:

Random strings occasionally spell out bad words or other offensive phrases. Is there a decent way to sidestep that? To be fair I could probably settle for a numeric string but it does have a hefty hit on the likelihood of clashes.
How do I get Django (or the database) to do the heavy lifting on insert? I'd rather not insert and then work out the key (as that wouldn't be much of a key). I assume there are concurrency issues to be aware of too though if two new pages were generated at the same time and the second (against all odds) magically got the same key as the first before the first was committed.

I don't see this being a million miles different from how URL shorteners generate their IDs. If there's a decent Django implementation of one, I could piggyback off that.

As a note: the 'URL shorteners' usually generate sequential URLs :). — Michał Górny, Jul 31 '13 at 10:20

score 24 · Answer 1 · answered Sep 28 '10 at 12:24

24

There is built-in Django way to achieve what you want. Add a field to the model of "custom page" with primary_key=True and default= name of key generation function, like this:

class CustomPage(models.Model):
    ...
    mykey = models.CharField(max_length=6, primary_key=True, default=pkgen)
    ...

Now, for every model instance page, page.pk becomes an alias for page.mykey, which is being auto-assigned with the string returned by your function pkgen() at the moment of creation of that instance.
Fast&dirty implementation:

def pkgen():
    from base64 import b32encode
    from hashlib import sha1
    from random import random
    rude = ('lol',)
    bad_pk = True
    while bad_pk:
        pk = b32encode(sha1(str(random())).digest()).lower()[:6]
        bad_pk = False
        for rw in rude:
            if pk.find(rw) >= 0: bad_pk = True
    return pk

The probability of two pages getting identical primary keys is very low (assuming random() is random enough), and there are no concurrency issues. And, of couse, this method is easilly extensible by slicing more chars from encoded string.

answered Sep 28 '10 at 12:24

atomizer

4,458
1
17
9

3

I don't understand the point of b32encode and sha1 in this concept. Wouldn't a simple random choice of a list of characters generate just as random a result, with a lot less overhead (and code)? – Oli Sep 28 '10 at 15:46
@Oli you can generate any string you want, the point is that setting a call back function to default is the way you would assign the string as the PK. Seems like right solution to me +1 Upvote – Rasiel Sep 29 '10 at 21:42
1

In a reusable setting, it can't do collision checking. There can't be more than once instance of a Model with the same slug. This is a flaw in the `default` argument not being able to take additional information (to pass the class to the generator). – Oli Sep 30 '10 at 12:59
1

`random_key = lambda: '{k:032X}'.format(k=random.getrandbits(128))` – Paulo Scardine Jul 04 '13 at 22:55
`id = django.utils.http.int_to_base36(uuid.uuid4().int)[:length]` – Nour Wolf Oct 27 '18 at 16:06
the b32encode line generates an error: TypeError: Unicode-objects must be encoded before hashing – Little Brain Dec 17 '18 at 18:41

Oli · Accepted Answer · 2019-07-10T17:45:31.670

10

Here's what I ended up doing. I made an abstract model. My use-case for this is needing several models that generate their own, random slugs.

A slug looks like AA##AA so that's 52x52x10x10x52x52 = 731,161,600 combinations. Probably a thousand times more than I'll need and if that's ever an issue, I can add a letter for 52 times more combinations.

Use of the default argument wouldn't cut it as the abstract model needs to check for slug collisions on the child. Inheritance was the easiest, possibly only way of doing that.

from django.db import models
from django.contrib.auth.models import User

import string, random

class SluggedModel(models.Model):
    slug = models.SlugField(primary_key=True, unique=True, editable=False, blank=True)

    def save(self, *args, **kwargs):
        while not self.slug:
            newslug = ''.join([
                random.sample(string.letters, 2),
                random.sample(string.digits, 2),
                random.sample(string.letters, 2),
            ])

            if not self.objects.filter(pk=newslug).exists():
                self.slug = newslug

        super().save(*args, **kwargs)

    class Meta:
        abstract = True

edited Jul 10 '19 at 17:45

answered Sep 30 '10 at 12:38

Oli

235,628
64
220
299

1

Interesting. I've recently decided to move to a UUID generation approach for some pk's but I might consider this as well. Your fragment would actually work the same either way I think. Just replace the 4 lines you generate 'ret' with something like '''ret = uuid.uuid1()''' – Van Gale Oct 01 '10 at 20:48
Im trying to use your method, but i get the Manager isn't accessible via ClassName instances error. How did u overcome that? – zsquare Feb 04 '11 at 18:29
1

This is an old thread, but one thing for anyone who stumbles upon this and is using MySQL to be wary of is that MySQL is by default case insensitive on string matching, so ids of "AB12AB" and "ab12ab" will both be found unless you explicitly tell MySQL to use case sensitive matching: http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html – umbrae Jul 06 '12 at 18:58
@Oli : Even though the combinations are enough for you, it might be a better idea to use a reduced character set (vowels eliminated to prevent bad words) and use 26+26+10-5 (lower+upper+number-vowels) = 57 characters in all 6 places, thus giving 34,296,447,249 combinations - about 50 times more. – user Jun 21 '14 at 06:52
Thanks for the code, @Oli. In `if self.objects.filter(pk=newslug).count()` you probably meant `if not`. Also `type(self).objects` instead of `self.objects` since objects cant access the Manager (the error that @zsquare pointed out) Also, FYI for others - in later versions of Django, using a custom primary key doesn't work too well on a model with ManyToManyMany Fields, (see [#25012](https://code.djangoproject.com/ticket/25012)) and gets more complicated if you try to roll back ([#24030](https://code.djangoproject.com/ticket/24030), [#22997](https://code.djangoproject.com/ticket/22997)) – Anupam Apr 06 '17 at 07:09
Adding to the comment above, [this post] (http://stackoverflow.com/questions/33779439/how-to-write-migration-to-change-primary-key-of-model-with-manytomanyfield) is helpful if someone does take the route of using custom primary key with m2m field(s) in the model – Anupam Apr 06 '17 at 07:15
Too much code for something that could be done with `newslug = ''.join(random.sample(string.letters + string.digits, 8))` – Ivan Castellanos Jul 10 '19 at 00:00
@IvanCastellanos I wanted a particular pattern, explained in the initial question. Thanks for your input nonetheless. – Oli Jul 10 '19 at 17:43
@Oil Yeah but you can still spell bad words this way, stuff like "ki55me", "ki11me" , "xb00bs, ""wh00re" so this solution doesn't fulfill that requirement as well. – Ivan Castellanos Jul 10 '19 at 19:14

score 9 · Answer 3 · edited May 23 '17 at 12:10

9

Django now includes an UUIDField type, so you don't need any custom code or the external package Srikanth Chundi suggested. This implementation uses HEX strings with dashes, so the text is pretty child-safe, other than 1337 expressions like abad1d3a :)

You would use it like this to alias pk to the uuid field as a primary key:

import uuid
from django.db import models

class MyModel(models.Model):
    uuid = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    # other fields

Note, however, that when you're routing to this view in urls.py, you need a different regex as mentioned here, e.g.:

urlpatterns = [
    url(r'mymodel/(?P<pk>[^/]+)/$', MyModelDetailView.as_view(),
        name='mymodel'),
]

edited May 23 '17 at 12:10

Community

1
1

answered Mar 20 '17 at 21:53

metakermit

21,267
15
86
95

My comment on that answer carries here too. UUIDs are great for unique, near infinite IDs but they're pretty user-unfriendly. Consider —in the context of Django and the web— this is something that *will* be on display and *may* be manually transcribed, and random strings eventually spell out swear words. – Oli Mar 21 '17 at 12:00
What swear words can you spell out in the hexadecimal number system? Note the only available letters are a, b, c, d, e, f. I agree however that random lengthy strings might not suit every use case. – metakermit Mar 21 '17 at 12:03
2

You asked so: `B00B5`.. But yeah, HEX is definitely better. The hulking 36-char length is the real issue here. – Oli Mar 21 '17 at 12:07

score 4 · Answer 4 · answered Sep 27 '10 at 13:23

4

May be you need to look at Python UUID, it can generate random lengthy characters. But you can slice it and use the number of characters you want with little check to make sure it's unique even after slicing.

UUIDField snippet may help you if you don't want to take pain of generating UUID yourself.

Also have a look at this blog post

answered Sep 27 '10 at 13:23

Srikanth Chundi

897
9
8

This doesn't really circumvent either of the two issues I highlight in the question. Granted `UUIDField` helps abstract some of the code away from my model but it's still outside the database (where I'd really like it) and still highly capable of spelling out rude words. – Oli Sep 27 '10 at 14:00

score 3 · Answer 5 · answered Sep 30 '10 at 20:31

3

Oli: If you're worried about spelling out rude words, you can always compare/search your UUIDField for them, using the django profanity filter, and skip any UUIDs that might be triggery.

answered Sep 30 '10 at 20:31

Elf Sternberg

16,129
6
60
68

And pity the people who live in Scunthorpe...profanity filters are awkward beasts! – Little Brain Dec 17 '18 at 18:09

score 1 · Answer 6 · answered Jul 17 '16 at 17:59

This is what I ended up using UUID.

import uuid 

from django.db import models
from django.contrib.auth.models import User


class SluggedModel(models.Model):
    slug = models.SlugField(primary_key=True, unique=True, editable=False, blank=True)

    def save(self, *args, **kwargs):
        if not self.slug:
            uuid.uuid4().hex[:16]    # can vary up to 32 chars in length
        super(SluggedModel, self).save(*args, **kwargs)

    class Meta:
        abstract = True

Note that those 16 bytes are technically only 15 bytes of randomness, because they include the version number of the uuid. — Ketzu, Sep 20 '20 at 18:20

score 1 · Answer 7 · answered Oct 27 '18 at 16:15

Looking at the above answers, here is what I am using now.

import uuid

from django.db import models
from django.utils.http import int_to_base36


ID_LENGTH = 9


def id_gen() -> str:
    """Generates random string whose length is `ID_LENGTH`"""
    return int_to_base36(uuid.uuid4().int)[:ID_LENGTH]


class BaseModel(models.Model):
    """Django abstract model whose primary key is a random string"""
    id = models.CharField(max_length=ID_LENGTH, primary_key=True, default=id_gen, editable=False)

    class Meta:
        abstract = True


class CustomPage(BaseModel):
    ...

Generating a non-sequential ID/PK for a Django Model

7 Answers7

Linked