1

I'm using Postgres as my database with my application written in Django using it's ORM API. I'm creating an extremely high quality random string of length 22 + prefix (24 total) to use as a PK for my objects like this...

This is my key generation?

def make_Key(app_prefix):
    return app_prefix + base64.b64encode(uuid.uuid4().bytes).decode("utf-8").rstrip('=\n').replace('/', '_').replace(
        "+", "-")

My question is, will I have issues with indexing or speed later on doing this? I'm NOT after a int vs UUID answer, UUIDs have more pros then cons for my system. What I need to know is if I can make this more "indexable", or if there is a better solution in Django for this (maybe there is nothing wrong with how I have done it?).

My model

class Image(models.Model):
    id = models.CharField(primary_key=True, max_length=28, unique=True,
                          default=get_key,editable=False)
  etc
Prometheus
  • 32,405
  • 54
  • 166
  • 302
  • 1
    Perhaps consider *Sequential GUIDs*: [What are the performance improvement of Sequential Guid over standard Guid?](http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid) – Alex K. Aug 18 '15 at 15:07
  • It's generally not recommended to use a UUID as an index. They are non-sequential and cause memory issues on sorts and inserts. I would keep a standard auto-incremented integer as an index and have an additional field that would store your 'key'. – Chris Montanaro Aug 18 '15 at 15:08
  • @ChrisMontanaro An auto-incremented integer limits me and is not right for my setup. I get your point on incremented tho but ints are not the answer. I read something instagram did which I don;t fully understand http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram – Prometheus Aug 18 '15 at 15:14
  • @AlexK. Thank you for the link, very interesting read. – Prometheus Aug 18 '15 at 15:18
  • 1
    You don't want an int based solution, I hear you but check this out anyway: you could go for something like this: http://hashids.org/python/ - it will allow you to use a collision-free "hash" (it's not really a hash, but an encrypted value, so you can decrypt back to int). This way you can encrypt/decrypt in your views, and keep the standard PK in the database. More pros: you can define the alphabet used for encrypted values, the min length, it works on many platforms. Anybody who doesn't know the salt value won't be able to decrypt, so no order can be determined from the encrypted values. – henrikstroem Aug 18 '15 at 15:47

1 Answers1

1

uuid.uuid4 will got you random generated key, where you can only hash but you can not get any order information. I would suggest you prefixed your primary key with timestamp, so you may roughly use it for b+ tree index.

You may take bson ObjectId for reference.

socrates
  • 1,203
  • 11
  • 16
  • From reading this is that kinda the thing instagram did? http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram – Prometheus Aug 18 '15 at 15:16
  • 1
    There are some existing sequential timestamp-prefixed UUID libraries in python. I have used the following one extensively: https://github.com/Tamarabyte/TimeUUID – Mark Galloway Aug 18 '15 at 15:31
  • @MarkGalloway would it still work (sequential) if I appended a prefix to the end of the that time stamp UUID? i.e. _1 – Prometheus Aug 18 '15 at 15:46
  • @OrbiterFleet, The last 64 bits generated are random so you could theoretically change them to your liking. – Mark Galloway Aug 18 '15 at 16:02