0

Triggered by a web-hook of an e-commerce platform after a new order, I have to update a table with the order's details in a local system. So for the customer's info insertion part, I need to determine whether the customer is an existing one, or should be inserted as a new customer...

So, I wrote this piece of code:

        sqlsrv_query($connection, sprintf("INSERT INTO z_web_users (UserID, OrderID, FirstName, LastName, Country, State, Region, Street, Zip, Phone, Email) VALUES ('%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s');",
            sqlsrv_fetch_array(sqlsrv_query($connection, "SELECT COALESCE((SELECT DISTINCT UserId FROM z_web_users WHERE Email = '{$order->billing->email}'), (SELECT MAX(UserId) + 1 FROM z_web_users), 1) AS ID;"), SQLSRV_FETCH_ASSOC)['ID'],
            $order->id,
            $order->billing->first_name,
            $order->billing->last_name,
            $order->billing->country,
            $states[$order->billing->state],
            $order->billing->city,
            $order->billing->address_1,
            $order->billing->postcode,
            $order->billing->phone,
            $order->billing->email,
        ));

The problem is a got many suggestions that I should add some mechanism to ensure the returned MAX(UserId) stays out of conflicts. Given that the table z_web_users doesn't take any measures by its design, nor do I have any control over its design, what can I do in my code to ensure that?

Faye D.
  • 833
  • 1
  • 3
  • 16
  • 2
    You'll want to avoid roundtripping this through PHP and do it in a stored procedure on the SQL Server itself. Any delay (even milliseconds) between selecting the max ID and inserting a new record is an opportunity for concurrency issues - where two or more concurrent requests will attempt to insert using the same ID. This is why [identity columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property) exist, and [sequences](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql) for the occassions you can't use identity. – AlwaysLearning Mar 29 '22 at 22:59
  • Unfortunately my knowledge on T-SQL is next to non-existent... I'm not familiar with stored procedures etc. I was suggested a `SEQUENCE` solution, and basically I was hoping for a suggestion on how to incorporate that in the code I put in my original question. :D – Faye D. Mar 29 '22 at 23:07
  • 2
    ^^^ the time might have come to learn... its never going to hurt your career having a good understanding of T-SQL. – Dale K Mar 29 '22 at 23:23
  • 1
    ^ agreed, I've answered or commented on several of your SQL Server questions, so you can't keep claiming you don't use it enough to learn it :-) – Aaron Bertrand Mar 30 '22 at 00:25
  • All those (8-9 I think) questions had to do with the same project I had taken up at that time. And even this question is just to build an extension on it. In my every day job I program on PHP, nothing to do with SQL Server. Nevertheless, indeed expanding my knowledge is something good! If only a day had 28 hours! :P Anyway, I'll look into this, and I may come back! :) – Faye D. Mar 30 '22 at 00:31
  • 1
    I would have thought even the little experience you have with SQL Server would have told you [not to inject data into your query](https://stackoverflow.com/questions/7505808/why-do-we-always-prefer-using-parameters-in-sql-statements) though – Charlieface Mar 30 '22 at 20:58

1 Answers1

0

Well, the first thing I would say is that you should avoid rolling your own id sequence mechanism if at all possible. This is a case of reinventing an existing well oiled wheel.

So, if you really have to roll your own, then I would use a stored procedure with the following content which takes a complete table lock, therefore ensuring only 1 application can access this at a time.

This is a simple way to solve the concurrency issue, and not the most optimal, but it will solve the problem in the short term while you investigate a better solution.

BEGIN
    SET NOCOUNT, XACT_ABORT ON;

    BEGIN TRAN;

    DECLARE @Id INT;

    -- Once we get this lock, we have exclusive access to the table until the end of the transaction
    SELECT @Id = COALESCE(MAX(id), (SELECT COALESCE(MAX(id), 0) + 1 FROM Users))
    FROM Users WITH (TABLOCKX)
    WHERE Email = @Email;

    -- Put your original insert here, modified to use the @Id we have already obtained
    INSERT INTO Users (Id, OtherCols...)
        VALUES (@Id, OtherCols...);

    COMMIT;

    RETURN 0;
END;

Note: The Create Procedure Docs provide all you need to know to encapsulate this into a stored procedure.

And you'll need to research how to call a stored procedure from your environment.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • I don't understand how I can incorporate this to my situation! Your suggestion seems to handle everything within SQL Server, while my word so far uses the PHP `sqlsrv` driver to perform the necessary commands within PHP. – Faye D. Mar 29 '22 at 23:24
  • I didn't downvote but you select `TOP 1` from `Users` with `TABLOCKX`, why not just `SELECT @Id = MAX(Id)` there? Then the insert just has to say `COALESCE(@Id+1, 1)` and not reference `Users` again (if it didn't get a value the first time, it can't possibly get a value the second time, either). – Aaron Bertrand Mar 30 '22 at 00:21
  • @AaronBertrand I've done that, it still needs a second query against the same table, but I guess the lock still applies correctly. – Dale K Mar 30 '22 at 00:24
  • @FayeD. so the reason you have ended up immersed in this complex SQL is because you are trying to roll-your-own identity mechanism. So it should really have been said prior to now that this is a really bad idea. You should if at all possible just use a built in `identity` column and then modify your app to handle that fact that every ID will be unique. This is a much easier change than rolling your own, custom generated ID. – Dale K Mar 30 '22 at 00:39
  • That table I'm trying to insert new customer rows to is existing. I didn't build it. I only have to write new customer records in it. And I have to do it like this: When a new customer makes a purchase with a first-seen email ID, I'll have to generate (in fact MAX()+1) new customer ID and insert the row. When a customer that used an existing email in this table makes a purchase, I will still have to insert the row, but with the same customer ID that was generated for that particular email! – Faye D. Mar 30 '22 at 00:44
  • I understand, but you would probably find it easier to change the id to an identity column, and modify your app to handle that. The interesting question is, how did it work before you came alone? – Dale K Mar 30 '22 at 00:45
  • When they explained the requirements I stayed speechless due to the poor design... But that's how they built it (afterall it's only an intermediate table that's used as a "cache" to calculate other tables' rows - so it doesn't really matter TBH). – Faye D. Mar 30 '22 at 00:46
  • No no, I have no control on the design of this table unfortunately. It is what it is, and I only have to insert new/existing customer rows using the way I described it in my previous comment! New email -> new (next) id / Current email -> Corresponding id – Faye D. Mar 30 '22 at 00:48
  • How did they do these inserts in the past? – Dale K Mar 30 '22 at 00:49
  • They generated a new (next) ID for any customer email (new/existing) and had to search for the email address and return the row with the MIN() ID of it. – Faye D. Mar 30 '22 at 00:51
  • Which is a much better way to do it. You should stick with it. – Dale K Mar 30 '22 at 00:52
  • But in that case, they still had to run the MAX() + 1, and then use that new (next) ID to store the customer. My job was to take care of the ID always incrementing even for existing emails. Which I (with your help) solved with the COALESCE solution, where I either take the ID where the email is the given one, or return the MAX() + 1 id – Faye D. Mar 30 '22 at 00:58
  • Yip, but its a better solution than rolling your own ID as you are suggesting (IMO). Because as you can see its complex to roll your own. – Dale K Mar 30 '22 at 00:59
  • That's what I'm trying to explain, that they were rolling their own (MAX + 1) as well. And they did it for all emails (existing and new). – Faye D. Mar 30 '22 at 01:00
  • Oh OK, but that would be easy to replace with an identity - whereas your new requirements can't be. Regardless, if you roll your own you need to address concurrency issues. – Dale K Mar 30 '22 at 01:02
  • 1
    Personally, if I *really* had to do this, I would do a single statement with no transaction: `INSERT ... SELECT TOP (1) @Id = MAX(id) + 1 FROM Users WITH (UPDLOCK, HOLDLOCK) WHERE Email = @Email ORDER BY Id DESC` or something like that. There is no need to lock the whole table against reads – Charlieface Mar 30 '22 at 12:30
  • @Charlieface for sure add as another answer - you're more experienced in this than me. Somehow I think the OP is going to ignore the issue though :) – Dale K Mar 30 '22 at 18:49
  • 1
    I'm sure it'll be revisited when it eventually breaks and causes minor havoc. Happened to me with a legacy app using select max()+1 once. Never made *that* mistake again! – SOS Mar 30 '22 at 21:20
  • @DaleK no, I'll not ignore this. I never ignore useful suggestions, especially ones made by people with more knowledge than me. I've just put it on hold until I complete the whole syncing process, then I'll come back to make this as robust as it can be given the limitations of the system due its initial design flaws... – Faye D. Apr 02 '22 at 02:11