0

Basically I'm trying to make my Django + PostgreSQL database perform better.

Say I have a table with first_name, last_name, state, address. For whatever reason I want first_name and last_name together to make my primary key. Doesn't have to make sense, it's just an example.

At first I used zlib.adler32() to generate a hash from the two strings and put it into a BigIntegerField and use that as my primary key, but I quickly discovered that the purpose of this function is totally different and I was getting collisions pretty quickly.

Currently I'm using hashlib.md5() to generate a hash into a 32 character CharField and use it as my primary key:

hashlib.md5(bytes(f'{first_name}{last_name}', encoding='utf-8')).digest().hex()

However things slowed down a lot. I don't know if it's because of the md5 algorithm, or because of the table having a string for primary key instead of an integer.

I know there is another option - unique_together value of the Meta class, but I've been trying to avoid any Django conveniences for performance reasons.

How would you recommend to make a primary key? I'm looking for performance, doesn't need to be human readable. If BigInteger is much faster as a primary key, how do I generate a 20 digit int from a variable length string?

Nikolay Dyankov
  • 6,491
  • 11
  • 58
  • 79
  • I would recommend to use UUID as primary key. Is there any limitations for not using uuids as primary key? – VJ Magar Mar 31 '21 at 04:58
  • No limitations, but doesn't UUID use md5 and other hash functions underneath, essentially what I'm doing right now? – Nikolay Dyankov Mar 31 '21 at 05:02
  • 1
    What are the slow operations against your table - read/write? A multi column index seems like a good choice, why the aversion to `unique_together`? – Iain Shelvington Mar 31 '21 at 05:03
  • @nikolay https://stackoverflow.com/a/13146662 check this out – VJ Magar Mar 31 '21 at 05:06
  • @VJMagar That helped a lot. From what I understand for randomly accessing data it doesn't matter if the key is a char or an int. Thanks – Nikolay Dyankov Mar 31 '21 at 05:18
  • UUID's are not a very good choice for primary keys. First it doesn't satisfy your need for a relation between the two fields (and your aversion to unique together, which is not a Django convenience, but a wrapper for a long time proven database feature), but if you're willing to let go of that requirement, [read this](https://github.com/ericelliott/cuid#readme). –  Mar 31 '21 at 06:30

0 Answers0