For perfromance - While Loop or Row_Number should be used to update a dataset?

Question

Can someone explain if a While Loop with a RowCounter is faster or slower than using ROW_NUMBER in SQL server?

Or is there no difference?

The existing stored procedure is too complex to put it all in a View and is very slow.. this is the While Loop, but how could it be done with ROW_NUMBER in Sql Server 2008 R2, and would there be any performance benefit?

DECLARE @StopRow AS INT

SELECT @StopRow = MAX(RowID)
FROM #Temp1

DECLARE @RowCounter AS INT

SET @RowCounter = 1

DECLARE @colID INT

WHILE (@RowCounter <= @StopRow)
BEGIN
    SELECT @colID = colID
    FROM #Temp1
    WHERE colRowID = @RowCounter

    IF (
            EXISTS (
                SELECT ParentColID
                FROM ParentTable a WITH (NOLOCK)
                JOIN MoreTableData b WITH (NOLOCK) ON a.priID = b.priID
                    AND colID = @colID
                WHERE anotherID NOT IN (
                        SELECT anotherID
                        FROM @anotherTempTable
                        )
                )
            )
        UPDATE #Temp1
        SET aFlag = 0
        WHERE colRowID = @RowCounter

    SET @RowCounter = @RowCounter + 1
END

Found this StackOverflow now that I understand the SET based (UPDATE) verbage .. How to convert a loop in SQL to Set-based logic

Doing things row by row is one of the most expensive ways to go about this. Take advantage of SQL's set based optimizations and try to rewrite this without a loop. — Michael McGriff, Jul 01 '14 at 19:01
Please do not write RBAR code (row by agonizing row). 99.99% of the time, there is a "set based" way to do what you want. — granadaCoder, Jul 01 '14 at 19:08
TY - while I am coming in to optimize after-the-fact - I still have much to learn about doing it correctly. — JoJo, Jul 01 '14 at 19:21

score 2 · Answer 1 · answered Jul 01 '14 at 18:58

why u nead any loop? Can't u do in one UPDATE ?

   UPDATE T
        SET aFlag = 0       
   FROM #Temp1 AS T       
   WHERE EXISTS (
                SELECT ParentColID
                FROM ParentTable a WITH (NOLOCK)
                JOIN MoreTableData b WITH (NOLOCK) 
                    ON a.priID = b.priID
                    AND a.colID = t.colID
                WHERE anotherID NOT IN (
                        SELECT anotherID
                        FROM @anotherTempTable
                        )
                )

Not tested.

score 1 · Accepted Answer · edited May 23 '17 at 12:34

Usually, a set-based solution offers a better performance than a cursor based solution.

If performance is a concern, I would try to rewrite my solution using a set based solution thus:

UPDATE  t1
SET     aFlag = 0
FROM    #Temp1 t1
WHERE   colRowID BETWEEN 1 AND (SELECT MAX(RowID) FROM #Temp1)
AND     EXISTS 
(
    SELECT ParentColID
        FROM ParentTable a 
        JOIN MoreTableData b ON a.priID = b.priID
        AND colID = t1.colID
        WHERE anotherID NOT IN 
        (
            SELECT anotherID
            FROM   @anotherTempTable
        )
)

Then, on a dev/server server I would use a SQL Trace / Extended events session to track following main KPIs: Duration, CPU, [Logical] Reads and I would compare these two solutions (cursor based vs. set based).

Note #1: (SELECT MAX(RowID) FROM #Temp1) sub-query should be evaluated just one time because it isn't correlated sub-query.

Note #2: Try to replace NOT IN with NOT EXISTS. Reason.

an amazing return. original query was 25 seconds and now returning in 7 seconds. That is 18 seconds faster! — JoJo, Jul 01 '14 at 19:31

For perfromance - While Loop or Row_Number should be used to update a dataset?

2 Answers2