2

Can I put a MAX value for the database table primary key, either via JPA or at the database level? If it is not possible, then I was thinking about

  1. Create a random key between 0-9999999999 (9999999999 is my MAX)
  2. Do a SELECT on the database with the newly create key, if return object is null, then INSERT, if not repeat go back to step 1

So if I do the above, two questions. Please keep in mind that the environment is high concurrent:

Q1: Does the overhead of check with SELECT, if not there, INSERT significant? What I really mean is: is this process normal, since usually I let the DB create a unique PK for me?

Q2: If Q1 does not create significant performance degradation, can I run into concurrent issue? For example, if P1 with Id1 check the table, Id1 is not there, it ready to insert, P2 sneak in insert Id1 before P1 could. So when P1 insert Id1, it fails. I dont want the process to fail here, I want it to go back up the loop, find a new id, repeat the process. How do I do that?

My environment is SQL and MYSQL db. I use JPA with Eclipselink implementation

NOTE: Some people question my decision to implement it this way, the answer is exact what TravisJ suggest below. I have a very high concurrent environment. When a process kick off, I need to create a request to another process, passing to that process a unique 10 character long id. Since the environment is high current, I want to leverage the unique, not null feature of PK. The request contain lot of information in it, so I create aRequest table, with the request Id as my PK. I know since all DB index their PK, query out the PK is fast. If there are better way, please let me know.

Thang Pham
  • 38,125
  • 75
  • 201
  • 285
  • 3
    Why would you want to do such a thing? – Ben Apr 04 '12 at 20:57
  • 1
    There virtually exists **no** reason why you would want that. It looks like you're abusing the role of a PK, which indicates you're doing something wrong. Long story short: just don't do it. – N.B. Apr 04 '12 at 20:58
  • 2
    I disagree that there is no reason. This could be used to create some sort of hash for a primary key. The primary concern brought up seems to be race conditions, which would mean the different processes would need a semaphore. – Travis J Apr 04 '12 at 21:00

3 Answers3

2

You can implement a Check Constraint in your table definition:

CREATE TABLE P
(
P_Id int PRIMARY KEY NOT NULL,
...
CONSTRAINT chk_P_Id CHECK (P_Id>0 and P_Id<9999999999)
)

EDIT: As stated in the comments, MySql does not honor CHECK constraints. This is a 6-year old defect in the bug log and the MySql team has yet to fix it. As MySql is now overseen by Oracle Corp, it may never be fixed (simply considered a "documented limitation", and people who don't like it can upgrade to the paid DBMS). However, this syntax, and the check constraint feature itself, DO work in Oracle, MS SQL Server (including SQLExpress/MSDE), MS Access, Postgre and SQLite.

KeithS
  • 70,210
  • 21
  • 112
  • 164
  • 1
    This syntax works on MySql, Oracle, MSS and MS Access for sure. http://www.w3schools.com/sql/sql_check.asp The related syntax for ALTER TABLE works on all of those DBs too. I can't speak for SQLite or PostgreSQL. – KeithS Apr 04 '12 at 21:11
  • 1
    Check constraints [_do not work_ in MySQL](http://stackoverflow.com/questions/2115497/check-constraint-in-mysql-not-working). They fail silently. – Ben Apr 04 '12 at 21:16
  • Edit: Now I can speak for Postgre and SQLite, it does work there too. – KeithS Apr 04 '12 at 21:17
  • @Ben: Wow. That's extremely odd that MySql would allow you to define something it never checks. That would be 15 kinds of confusing to a dev. – KeithS Apr 04 '12 at 21:18
  • @KeithS, it's been discussed in detail [all over](http://stackoverflow.com/questions/9254229/mysql-check-constraint-with-date) [SO](http://stackoverflow.com/questions/5807231/mysql-check-constraint-alternative) and [DBA.se](http://dba.stackexchange.com/questions/9662/check-constraint-does-not-work). I'm yet to find anyone who disagrees with you on that! – Ben Apr 04 '12 at 21:19
  • Thank you the suggestion, KeithS. I think my prod DB will be SQL, so I think I can still you it. Can you take a look at `Q2`: Is there a specific exception for duplicate primary key? Thank you – Thang Pham Apr 04 '12 at 21:24
  • @ThangPham, SQL is not a database system... If it's MySQL this will not work. If it's anything else Keith has suggested then you're fine. – Ben Apr 04 '12 at 21:26
  • @Ben: sorry, when I said SQL, I mean Microsoft SQL server. Sorry. Quick question Ben, when I insert row entry with a duplicate PK, is there an exception I can catch to detect duplicate insertion? – Thang Pham Apr 04 '12 at 21:39
2

Why not start at 1 and use auto-increment? This will be much more efficient because you will not get collisions, which you must cycle through. If you run out of numbers, you will be in the same boat either way, but at least going sequentially, you won't have to deal with collisions.

Imagine trying to find an unused key when you have used up 90% of your available numbers. That will take some time, and there is always a possibility that it never (in your lifetime) finds an unused key if you are generating them randomly.

Also, using auto-increment, it's easy to tell if you're close to the limit (SELECT MAX(col)). You could script an alert to let you know when you need to reset. For the random method, what would that query look like?

If you're using InnoDB, then you still might not want to use a primary key. Inserting random records into a clustered index is a performance hit since the actual table data must be reordered. Instead use a unique key (with an auto-increment primary key).

Using a unique index on the column in question, simply generate a random number in the range and attempt to insert it. If the insertion fails, then generate a new number and try again. If the insert succeeds, then proceed. This accounts for the concurrency issue.

Still, the sequential auto-increment key is going to yield better performance.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • The thing is we plan to do database clean up every three months. So keys are being deleted off the database. So in an auto-increment key, does the database key ever wrap around, let say that it is at MAX (9999999999), but key from 0 - 50000000 are now free since we deleted off the database, does the database know to wrap the key around for the next insert? – Thang Pham Apr 05 '12 at 01:17
  • @ThangPham, to start the auto-increment column at 1 again after you wipe the database, simply perform `ALTER TABLE tbl AUTO_INCREMENT = 1`. – Marcus Adams Apr 05 '12 at 10:50
1

See,

http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing

and,

http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Advanced_Sequencing

JPA already has good Id generation support, it does not make sense to implement your own.

If you are concerned about concurrency and performance, and using MySQL, I would recommend using TABLE generator with a large preallocation size (on other databases I would recommend SEQUENCE generator). If you have a lot of data, ensure you use a long for your id.

If you really think you need more than this, then consider UUID id generation. EclipseLink 2.4 with provide a @UUIDGenerator.

James
  • 17,965
  • 11
  • 91
  • 146
  • this is exactly what I am looking. However, I have couples other questions on this. My database is Microsoft SQL server, which I dont think support `@SequenceGenerator`, so I have to use `@TableGenerator`, so if you can take a look at this post, I have couples question on it, http://stackoverflow.com/questions/10033727/what-are-the-difference-between-sequence-id-using-jpa-tablegenerator-generat . Thank you very much – Thang Pham Apr 05 '12 at 18:43