Make this SQL query more efficient

Question

I have the following query which checks goods receipts against purchase orders to see what items were originally ordered and how many have been booked in via goods receipts. E.g I place a purchase order for 10 banana milkshakes, I then generate a goods receipt stating that I received 5 of these milkshakes on said purchase order.

SELECT t.PONUM, t.ITMNUM, t.ordered, 
       SUM(t.received) as received, 
       t.ordered - ISNULL(SUM(t.received),0) as remaining, 
       SUM(t.orderedcartons) as orderedcartons, 
       SUM(t.cartonsreceived) as cartonsreceived, 
       SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM, 
       pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received, 
       pod.DELIVERYSIZE as orderedcartons, 
       ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived, 
       (pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod 
  LEFT OUTER JOIN TBLGRDETAILS grd 
    ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered
ORDER BY t.PONUM

Which returns the following data:

PONUM   ITMNUM  ordered received remaining orderedcartons cartonsreceived remainingcartons

1       1     5.0000    3.0000      2.0000   5.0000         3.0000          2.0000

Next I have a C# loop to generate update queries based on the data I get back from the above query:

foreach (DataRow POUpdate in dt.Rows) {...

query += "UPDATE MYTABLE SET REMAININGITEMS=" + remainingQty.ToString() 
       + ", REMAININGDELIVERYSIZE=" + remainingBoxes.ToString() + " WHERE ITMNUM=" 
       + itemNumber + " AND PONUM=" + poNumber + ";";

I then execute each update query against the DB. Which works fine on my local dev machine.

However deploying to production server pulls back over 150,000 records on that first query.

So looping around so many rows locks up SQL and my app. Is it the foreach? Is it the original select loading all that data into memory? Both? Can I make this query into one single query and cut out the C# loop? If so what's the most efficient way to achieve this?

how about formatting the query, I won't waste my time even looking at it the way it is now. UPDATE can work on a set of rows. you should try to work your SELECT into the UPDATE and you won't need the loop. — KM., Jul 12 '12 at 13:25
To be honest, I'd be **much** wore worried about the fact that you're clearly at risk of lost updates. — Marc Gravell, Jul 12 '12 at 13:27
To be honest, I'd be **much** more worried about the fact that you're clearly at risk of Sql injection... maybe not in this particular query, but if this is the normal pattern he uses for building the sql, he's in trouble. @MarcGravell — Joel Coehoorn, Jul 12 '12 at 13:30
@Oded, I'm not sure of your edit, what is up with the `)` at the end of the `ON` clause of the `LEFT OUTER JOIN`? — KM., Jul 12 '12 at 13:31
@KM- Good question. I didn't _change_ the query apart from formatting And adding spaces before keywords like `FROM`. — Oded, Jul 12 '12 at 13:33
And I think the title of the question could be improved... Just an FYI since it's locked for my editing now. — David Manheim, Jul 12 '12 at 13:45

score 5 · Accepted Answer · edited May 23 '17 at 11:56

In SQL, the goal should be to write operations on entire tables at once. The SQL server can be very efficient at doing so, but will need a significant overhead on any interaction, since it needs to deal with consistency, atomicity of transactions, etc. So in a way, your fixed cost per transaction is high, for the server to do its thing, but your marginal cost for additional rows in a transaction is very low - updating 1m rows may be 1/2 as fast as updating 10.

This means that the foreach is going to cause the SQL server to constantly go back and forth with your application, and that fixed cost of locking/unlocking and doing transactions is being occurred every time.

Can you write the query to operate in SQL, instead of manipulating data in C#? It seems you want to write a relatively simple update based on your select statement (See, for instance, SQL update from one Table to another based on a ID match.

Try something like the following (Not code tested, since i don't have access to your database structure, etc.):

UPDATE MYTABLE 
  SET REMAININGITEMS = remainingQty, 
  REMAININGDELIVERYSIZE=remainingBoxes
From 
(SELECT t.PONUM, t.ITMNUM, t.ordered, 
       SUM(t.received) as received, 
       t.ordered - ISNULL(SUM(t.received),0) as remaining, 
       SUM(t.orderedcartons) as orderedcartons, 
       SUM(t.cartonsreceived) as cartonsreceived, 
       SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM, 
       pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received, 
       pod.DELIVERYSIZE as orderedcartons, 
       ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived, 
       (pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod 
  LEFT OUTER JOIN TBLGRDETAILS grd 
    ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered
ORDER BY t.PONUM ) as x

join MYTABLE on MYTABLE.ITMNUM=x.itmnum AND MYTABLE.PONUM=i.ponum

AS pointed out in the comments to the question, SQL injection can be an issue. If you don't put in any input to the update, this query is safe from that possibility. — David Manheim, Jul 12 '12 at 13:47

Neville Kuyt · Answer 2 · 2012-07-12T13:54:02.430

As KM says in the comments, the problem here is coming back to the client app, and to then operate on each row with another database trip. That's slow, and can lead to stupid little bugs, which could cause bogus data.

Also, concatenating strings into SQL as you're doing is generally considered a very bad idea - SQL Injection (as Joel Coehoorn writes) is a real possibility.

How about:

create view OrderBalance 
as 
SELECT t.PONUM, t.ITMNUM, t.ordered, 
       SUM(t.received) as received, 
       t.ordered - ISNULL(SUM(t.received),0) as remaining, 
       SUM(t.orderedcartons) as orderedcartons, 
       SUM(t.cartonsreceived) as cartonsreceived, 
       SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM, 
       pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received, 
       pod.DELIVERYSIZE as orderedcartons, 
       ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived, 
       (pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod 
  LEFT OUTER JOIN TBLGRDETAILS grd 
    ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered

This seems to have exactly the data that your "MYTABLE" has - maybe you don't even need MYTABLE anymore, and you can just use the view!

If you have other data on MYTABLE, your update becomes:

UPDATE MYTABLE 
SET REMAININGITEMS       = ob.remainingitems, 
    REMAININGDELIVERYSIZE = ob.remainingBoxes
from MYTABLE mt 
   join OrderBalance ob 
on mt.ITMNUM = ob.itemNumber 
AND mt.PONUM = ob.poNumber

(Although, as David Mannheim writes, it may be better to not use a view and use a solution similar to the one he proposes).

As someone who works with database systems and applications, I'll beg you not to just create new views all of the time, especially when the goal here is to just perform an update. Instead, could you use a CTE instead of the view? It would have similar advantages without having the drawback of adding anything to the Database itself. — David Manheim, Jul 12 '12 at 13:40
Also, explicit joins are frequently easier to understand; `from MYTABLE mt join OrderBalance ob on mt.ITMNUM = ob.itemNumber AND mt.PONUM = ob.poNumber` — David Manheim, Jul 12 '12 at 13:41

Sergey Kalinichenko · Answer 3 · 2012-07-12T13:47:29.270

The other answers show you a great way to perform the whole update entirely in RDBMS. If you can do it like that, that's the perfect solution: you cannot beat it with a C# / RDBMS combination because of extra roundtrips and data transfer issues.

However, if your update requires some calculations that for one reason or the other cannot be performed in RDBMS, you should modify your code to construct a single parameterized update in place of a potentially gigantic 150000-row update that you are currently constructing.

using (var upd = conn.CreateCommand()) {
    upd.CommandText = @"
        UPDATE MYTABLE SET
            REMAININGITEMS=@remainingQty
        ,   REMAININGDELIVERYSIZE=@remainingBoxes
        WHERE ITMNUM=@itemNumber AND PONUM=@poNumber";
    var remainingQtyParam = upd.CreateParameter();
    remainingQtyParam.ParameterName = "@remainingQty";
    remainingQtyParam.DbType = DbType.Int64; // <<== Correct for your specific type
    upd.Parameters.Add(remainingQtyParam);
    var remainingBoxesParam = upd.CreateParameter();
    remainingBoxesParam.ParameterName = "@remainingBoxes";
    remainingBoxesParam.DbType = DbType.Int64; // <<== Correct for your specific type
    upd.Parameters.Add(remainingBoxesParam);
    ...
    foreach (DataRow POUpdate in dt.Rows) {
        remainingQtyParam.Value = ...
        remainingBoxesParam.Value = ...
        upd.ExecuteNonQuery();
    }
}

The idea is to make 150,000 updates that look the same into a single parameterized update that is actually a single statement.

This seems like it will not help with the speed problem, it will help with data transfer issue - which is real, but not issue with the speed of SQL server updates. If it can be done entirely in SQL, that would mean the application does not need to transfer ANYTHING back and forth, and SQL can do its data manipulation thing on its own. — David Manheim, Jul 12 '12 at 13:37
@DavidManheim This is true only if you can confine the calculations to your RDBMS. There is no evidence in the OP that this can be done! If setting REMAININGITEMS requires some calculations in C#, there's no easy way to push that onto RDBMS. Of course if the query is a plain re-assignment, then your solution would work; otherwise, it has no chance. — Sergey Kalinichenko, Jul 12 '12 at 13:39
I agree - but it seems that everything that is done here is in fact only an SQL update. I think we are seeing the frequent problem of a developer who needs to do SQL interaction without any support from the DBAs about why/how to use it. It may be that it can be done entirely in SQL - and if not, then your solution is applicable whereas mine is not. (PS. I didn't downvote your answer - it is a good one, if in fact other manipulation needs to occur in C#. I just didn't think it did.) — David Manheim, Jul 12 '12 at 13:44

Make this SQL query more efficient

3 Answers3