0

I have a Postgres 9.3 table that has a column called id as PKEY, id is char(9), and only allow lowercase a-z0-9, I use Python with psycopg to insert to this table.

When I need to insert into this table, I call a Python function get_new_id(), my question is, how to make get_new_id() efficient?

I have the following solutions, none of them satisfy me.

a) Pre-generate a lot of ids, store them in some table, when I need a new id, I SELECT one from this table, then delete it from this table, then return this selected id. Down side of this solution is that it need to maintain this table, in each get_new_id() call, there will also have a SELECT COUNT in order to find out if I need to generate more ids to put into this table.

b) When get_new_id() gets called, it generate a random id, then pass this id to a stored procedure to check if this id is already in use, if no, we are good, if yes, do b) again. Down side of this solution is, when the table gets bigger, the failure rate may be high, and there is a chance that, two get_new_id() calls in two processes will generate the same id, say, 1234567, and 1234567 is not used a PKEY yet, so, when insert, one process will fail.

I think this is a pretty old problem, what's the perfect solution?

Edit

I think this has been answered, see Jon Clements' comment.

Not an ID
  • 2,579
  • 2
  • 21
  • 28
  • There are `1e14` different 9-character base-36 strings, so a "high" collision rate is doubtful. – NPE Dec 14 '14 at 10:58
  • You might be able to get away with a [custom sequence](http://www.postgresql.org/docs/8.3/static/sql-createsequence.html) , starting at an arbitrarily large number, with a quite large step (and possibly cycle) then put a trigger that calls `get_new_id` which takes the new PK and makes a base 36 representation of it. – Jon Clements Dec 14 '14 at 11:12
  • @JonClements Didn't think sequence, I think this does it, you should make this an answer. (Why do you say "might be able" in the first sentence, is there any gotcha in this solution?) – Not an ID Dec 14 '14 at 11:25
  • I'm not 100% sure. It was just a "off the top of my head" idea - so could well be a gotcha/impracticality involved. – Jon Clements Dec 14 '14 at 11:30
  • I'm gonna try it, thanks :) – Not an ID Dec 14 '14 at 11:31
  • See [How to generate random unique number in PostgreSQL using function](http://stackoverflow.com/questions/22908499). – Daniel Vérité Dec 16 '14 at 21:46
  • Also note that the collision probability for the purely random technique is higher than you may think. This is known as the [birthday paradox](http://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem) – Daniel Vérité Dec 16 '14 at 21:48

2 Answers2

1

Offtopic because you already have a char(9) datatype:

I would use an UUID when a random string is needed, it's a standard and almost any programming language (including Python) can generate UUIDs for you.

PostgreSQL can also do it for you, using the uuid-ossp extension.

Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
0
select left(md5(random()::text || now()), 9);
   left    
-----------
 c4c384561

Make the id the primary key and try the insert. If an exception is thrown catch it and retry. Nothing fancy about it. why only 9 characters? Make it the full 32.

Check this answer for how to make it smaller: https://stackoverflow.com/a/15982876/131874

Community
  • 1
  • 1
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • I need to display this id, UUID or longer version is not good looking. and I doubt left(md5(), 9) has a lower collision that just random string generation. – Not an ID Dec 14 '14 at 11:27
  • @NotanID a 9 character string does not have a lower collision chance than any other 9 character string, regardless of how you generate it. The only way you lower that chance is making it bigger and/or adding other chars like special or punctuation chars. _good looking_? What are you talking about? =| – Clodoaldo Neto Dec 14 '14 at 11:43
  • @NotanID And that is random. You missed the `random()` part. – Clodoaldo Neto Dec 14 '14 at 11:45