2

*Skip to bold text for less explanation

I have been reading a couple of articles and threads on the pros/cons of using UUID/GUID vs. auto-incremented integer for primary key (or unique constraint) such as this one and UUID performance in MySQL?

My scenario is: Client generates an order that MUST be unique. Client sends order through semi-bad wifi or 3G to web-server that handles the request by PHP, inserts order into database and tells client that everything went well. Client flushes cache and starts rebuilding it with new orders until it can be emptied again, which is usually instantly, but that has nothing to do with the problem, which is:

The "semi-bad wifi or 3G"-part, since this will occasionally result in duplicate order entries in the database in cases where the client is not successfully informed that the order insert was successful, and it retries the same order.

The database will of course just auto increment the identical order with a new PK, so an auto_increment-only solution does not work.

My current solution is that the client is registered with a unique ID (just integer) that's given out by the server as the client registers as a client. The client then does its own auto_increment by just starting at 1 and going until the end of time for every order. The orders-table then has a composite unique index consisting of two integer rows, one for client id and one for the clients own order id.

This approach works, for now, but I have two concerns:

  1. How badly does a composite index of two integers in random order affect insert performance?
  2. What happens in the case of a client rollback, such as if the client restores from a backup, which will mess up the clients own order-id-system? Well I can answer that myself: Actual orders will be treated as inserted but then thrown out (INSERT IGNORE), since the database will think it already has them.

So, my new solution was to generate UUID/GUIDs client side and use those to prevent duplicate inserts. I then came to the conclusion that GUIDs alone would eventually - as the table grows - affect insert performance severely compared to using auto_increments. To prevent this negative effect, I thought of the following, which is the actual question:

Since all orders are marked by date, and a date will only insert so many orders (in this system rarely more than a thousand per day), and I'm not concerned with using GUIDs as identifiers or for merging purposes - I intend to keep the auto_increment integer as PK and order identification - would it be a good idea to combine the date with a GUID, with the date being the first part of the index? As far as I know the database would only need to check the GUIDs for each date, since both date and GUID will be delivered by the client for every attempt to insert an order:

The client would deliver the order, the orderlines for that order, details, whatever other info, the date and a GUID. If the server already has the date AND the GUID, it will ignore the insert and let the client know that the insert was successful. I know this will work, but would this be advisable in the long run, and how would it compare to using a single unique index with only a GUID - wouldn't the server have to look through a lot more data for each insert if it can't block the data by date?

Also, this has the added theoretical benefit of only having a risk of a duplicate GUID for each date. And I know this does not really matter in the real world, but it gives me peace of mind nonetheless.

Community
  • 1
  • 1
nickdnk
  • 4,010
  • 4
  • 24
  • 43
  • Where is the risk reduction when the GUID can only be "duplicate for each date"? That's no risk reduction *at all*. Either there is a risk of duplicates, or there is not. With GUIDs there is not, for all intents and purposes. The available number space is so enormously vast and the algorithm so carefully crafted that "GUID collision" is not a realistic risk to be worried about. – Tomalak Jan 05 '15 at 16:30
  • 1
    That being said, if you are concerned about insert/search performance and have thought of a few approaches, why not simply try it? A script that generates a few million fake rows is absolutely trivial. Do that, measure, scrap everything, change your strategy, repeat & compare the measurements. It's hard to predict DB server performance purely on paper. However, it's much easier to *explain* performance differences after there have been some measurements. (Personally, I would not expect using GUIDs to turn into a performance bottleneck, but YMMV.) – Tomalak Jan 05 '15 at 16:36
  • To the first comment: The index would be a combination of date and GUID. Hence, if you include date, your theoretically reset the pool of available GUIDs every day, since a duplicate GUID will have to be generated within the same date index to be considered a duplicate by the server, and yes, I know the risk is unrealistic, it was more of a trivial bonus. – nickdnk Jan 05 '15 at 19:49
  • And to the second: I'm very well aware, but I'm not good with benchmarking mysql, so I figured maybe someone would easily be able to explain the theoretical, and therefore most likely the practical, outcome of my intentions :) – nickdnk Jan 05 '15 at 19:50
  • 1
    Your fears regarding GUID collisions are unrealistic. There are so unbelievably many GUIDs that you probaly underestimate their number by several dozen orders of magnitude. http://stackoverflow.com/questions/2977593/is-it-safe-to-assume-a-guid-will-always-be-unique. Just forget optimizing for a case that will never ever happen and concentrate on your problem. GUIDs are infinitely unique for all you need to be concerned about. – Tomalak Jan 05 '15 at 20:00
  • To get an idea how to benchmark mySQL, turn to http://explainextended.com. – Tomalak Jan 05 '15 at 20:04
  • I know they are. I was not including collision in my reasoning, it was just a like "hey, also that". I will read up on that. Do you happen to know if the mysql uuid function follows some kind of ISO? I need to be able to merge it with generated IDs from Objective-C. As in they need to be the same length for correct field type. – nickdnk Jan 05 '15 at 20:19
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68244/discussion-between-tomalak-and-nickdnk). – Tomalak Jan 05 '15 at 20:31

0 Answers0