*Skip to bold text for less explanation
I have been reading a couple of articles and threads on the pros/cons of using UUID/GUID vs. auto-incremented integer for primary key (or unique constraint) such as this one and UUID performance in MySQL?
My scenario is: Client generates an order that MUST be unique. Client sends order through semi-bad wifi or 3G to web-server that handles the request by PHP, inserts order into database and tells client that everything went well. Client flushes cache and starts rebuilding it with new orders until it can be emptied again, which is usually instantly, but that has nothing to do with the problem, which is:
The "semi-bad wifi or 3G"-part, since this will occasionally result in duplicate order entries in the database in cases where the client is not successfully informed that the order insert was successful, and it retries the same order.
The database will of course just auto increment the identical order with a new PK, so an auto_increment-only solution does not work.
My current solution is that the client is registered with a unique ID (just integer) that's given out by the server as the client registers as a client. The client then does its own auto_increment by just starting at 1 and going until the end of time for every order. The orders-table then has a composite unique index consisting of two integer rows, one for client id and one for the clients own order id.
This approach works, for now, but I have two concerns:
- How badly does a composite index of two integers in random order affect insert performance?
- What happens in the case of a client rollback, such as if the client restores from a backup, which will mess up the clients own order-id-system? Well I can answer that myself: Actual orders will be treated as inserted but then thrown out (INSERT IGNORE), since the database will think it already has them.
So, my new solution was to generate UUID/GUIDs client side and use those to prevent duplicate inserts. I then came to the conclusion that GUIDs alone would eventually - as the table grows - affect insert performance severely compared to using auto_increments. To prevent this negative effect, I thought of the following, which is the actual question:
Since all orders are marked by date, and a date will only insert so many orders (in this system rarely more than a thousand per day), and I'm not concerned with using GUIDs as identifiers or for merging purposes - I intend to keep the auto_increment integer as PK and order identification - would it be a good idea to combine the date with a GUID, with the date being the first part of the index? As far as I know the database would only need to check the GUIDs for each date, since both date and GUID will be delivered by the client for every attempt to insert an order:
The client would deliver the order, the orderlines for that order, details, whatever other info, the date and a GUID. If the server already has the date AND the GUID, it will ignore the insert and let the client know that the insert was successful. I know this will work, but would this be advisable in the long run, and how would it compare to using a single unique index with only a GUID - wouldn't the server have to look through a lot more data for each insert if it can't block the data by date?
Also, this has the added theoretical benefit of only having a risk of a duplicate GUID for each date. And I know this does not really matter in the real world, but it gives me peace of mind nonetheless.