SQL - Update rows between two values in a column

Question

I have a data set as below in SQL Server:

ROW_NUM  EMP_ID  DATE_KEY  TP_DAYS
1        U12345  20131003   1
2        U12345  20131004   0
3        U12345  20131005   0
4        U12345  20131006   0
5        U12345  20150627   1
6        U12345  20150628   0
1        U54321  20131003   1
2        U54321  20131004   0
3        U54321  20131005   0
4        U54321  20131006   0

I need to update all the zeros in column TP_DAYS with values increment by 1 to the previous value.
The required result set will be as follows:

ROW_NUM  EMP_ID  DATE_KEY  TP_DAYS
1        U12345  20131003   1
2        U12345  20131004   2
3        U12345  20131005   3
4        U12345  20131006   4
5        U12345  20150627   1
6        U12345  20150628   2
1        U54321  20131003   1
2        U54321  20131004   2
3        U54321  20131005   3
4        U54321  20131006   4

I tried using LAG and LEAD function in SQL. But couldn't achieve the result as expected.

Can someone help me to achieve it.

score 2 · Accepted Answer · edited Jan 18 '21 at 12:31

Using windowed functions (SUM/ROW_NUMBER so it will work with SQL Server 2008):

WITH cte AS
(
  SELECT *, s =  SUM(TP_DAYS) OVER(PARTITION BY EMP_ID ORDER BY ROW_NUM)
  FROM #tab
), cte2 AS
(
  SELECT *,
    tp_days_recalculated = ROW_NUMBER() OVER (PARTITION BY EMP_ID, s ORDER BY ROW_NUM)
  FROM cte
)
UPDATE cte2
SET TP_DAYS = tp_days_recalculated;

SELECT *
FROM #tab;

LiveDemo

Output:

╔═════════╦════════╦══════════╦═════════╗
║ ROW_NUM ║ EMP_ID ║ DATE_KEY ║ TP_DAYS ║
╠═════════╬════════╬══════════╬═════════╣
║       1 ║ U12345 ║ 20131003 ║       1 ║
║       2 ║ U12345 ║ 20131004 ║       2 ║
║       3 ║ U12345 ║ 20131005 ║       3 ║
║       4 ║ U12345 ║ 20131006 ║       4 ║
║       5 ║ U12345 ║ 20150627 ║       1 ║
║       6 ║ U12345 ║ 20150628 ║       2 ║
║       1 ║ U54321 ║ 20131003 ║       1 ║
║       2 ║ U54321 ║ 20131004 ║       2 ║
║       3 ║ U54321 ║ 20131005 ║       3 ║
║       4 ║ U54321 ║ 20131006 ║       4 ║
╚═════════╩════════╩══════════╩═════════╝

#Addendum

Original OP question and sample data are very clear that tp_days indicators are 0 and 1 not any other values.

Especially for Atheer Mostafa:

check this example as a proof: https://data.stackexchange.com/stackoverflow/query/edit/423186

This should be new question, but I will handle that case:

;WITH cte AS
(
  SELECT *
   ,rn = s +  ROW_NUMBER() OVER(PARTITION BY EMP_ID, s ORDER BY ROW_NUM) -1
   ,rnk = DENSE_RANK() OVER(PARTITION BY EMP_ID ORDER BY s)
  FROM (SELECT *, s =  SUM(tp_days) OVER(PARTITION BY EMP_ID ORDER BY ROW_NUM)
        FROM #tab) AS sub
), cte2 AS
(
  SELECT c1.*,
   tp_days_recalculated = c1.rn - (SELECT COALESCE(MAX(c2.s),0)
                                   FROM cte c2
                                   WHERE c1.emp_id = c2.emp_id
                                     AND c2.rnk = c1.rnk-1)
  FROM cte c1
)
UPDATE cte2
SET tp_days = tp_days_recalculated;

LiveDemo2

Output:

╔═════════╦════════╦══════════╦═════════╗
║ row_num ║ emp_id ║ date_key ║ tp_days ║
╠═════════╬════════╬══════════╬═════════╣
║       1 ║ U12345 ║ 20131003 ║       2 ║
║       2 ║ U12345 ║ 20131004 ║       3 ║
║       3 ║ U12345 ║ 20131005 ║       4 ║
║       4 ║ U12345 ║ 20131006 ║       3 ║
║       5 ║ U12345 ║ 20150627 ║       4 ║
║       6 ║ U12345 ║ 20150628 ║       5 ║
║       1 ║ U54321 ║ 20131003 ║       2 ║
║       2 ║ U54321 ║ 20131004 ║       3 ║
║       3 ║ U54321 ║ 20131005 ║       1 ║
║       4 ║ U54321 ║ 20131006 ║       2 ║
╚═════════╩════════╩══════════╩═════════╝

it shouldn't change the values 3,4,2 to 1 .... this is the case. I don't need your solution when I have another generic answer, you don't tell me what to do ... thank you

Solution mentioned in comment is nothing more than quirky update. Yes it will work, but may easily fail:

First of all there is no such thing as ordered table per se
Query optimizer may read data in any way(especially when dataset is big and parallel execution is involved). Without ORDER BY you cannot guarantee the stable result
The behavior is not documented,might work today but could break in the future

Related articles:

Thank You so much. I am working with SQL Server 2012. Do we have any other function for replacing the ROW_NUMBER() — user2107971, Jan 13 '16 at 13:27
This query will not function as required if the TB_Days values in the initial table has values different than 1 "values increment by 1 to the previous value." ... try some initial values in TB_Days as 3 ,4 , 10 ... it will not function. thank you — Atheer Mostafa, Jan 13 '16 at 19:28
@AtheerMostafa This downvote is joke. The point is OP original question uses 1 as end of island. This is classic gap and island problem, but I reduced it for simplicity(0 and 1). If OP wants different values, he should write it. For case presented in question my answer is valid. Thank you. — Lukasz Szozda, Jan 13 '16 at 19:32
check this example as a proof: https://data.stackexchange.com/stackoverflow/query/edit/423186 — Atheer Mostafa, Jan 13 '16 at 19:34
@AtheerMostafa Please read carefully once again my last comment. 1 is for start/end of island. Not 2 not any other value. If OP wants different value he should edit question. If you want general case, ask new question and I show you solution that handle any other value as island border. Thank you. — Lukasz Szozda, Jan 13 '16 at 19:35
@lad it shouldn't change the values 3,4,2 to 1 .... this is the case. I don't need your solution when I have another generic answer, you don't tell me what to do ... thank you. — Atheer Mostafa, Jan 13 '16 at 19:42

Gordon Linoff · Answer 2 · 2016-01-14T02:35:38.860

0

Let me assume SQL Server 2012+. You need to identify groups that are delimited by 1. A simple way to calculate the group is by doing a cumulative sum of 1s. Then row_number() can be used to calculate the new value. You can do this work uisng an updatable CTE:

with toupdate as (
      select t.*,
             row_number() over (partition by empid, grp order by row_num) as new_tp_days
      from (select t.*, 
                   sum(tp_days) over (partition by emp_id order by row_num) as grp
            from t
           ) t
     )
update toupdate
    set tp_days = new_tp_days;

In earlier versions of SQL Server, you can accomplish the same thing (less efficiently). One method uses outer apply.

edited Jan 14 '16 at 02:35

answered Jan 13 '16 at 13:11

Gordon Linoff

1,242,037
58
646
786

1

Hi Gordon, It seems the grouping is not giving result as expected. I am re working on the same to produce the result as expected. Thanks for your help. – user2107971 Jan 13 '16 at 13:29
@user2107971 . . . I left `empid` out of the partition clause for `row_number()`. Oops. – Gordon Linoff Jan 14 '16 at 02:35

score -1 · Answer 3 · answered Jan 13 '16 at 14:48

-1

I have tricky much simpler way with a simple code as follows:

DECLARE @last int=0
UPDATE #Employees set @last=CASE WHEN TP_DAYS=0 THEN @last+1 ELSE TP_DAYS END,
TP_DAYS=CASE WHEN TP_DAYS=0 THEN @last ELSE TP_DAYS END

This runs in any SQL Server Engine Check the demo here

https://data.stackexchange.com/meta.stackoverflow/query/422955/sql-update-rows-between-two-values-in-a-column?opt.withExecutionPlan=true#resultSets

answered Jan 13 '16 at 14:48

Atheer Mostafa

735
3
8

**Tricky and not correct entirely.** This is some kind of `quirky update`. Good luck when parallelism is involved or query optimizer decides to read data in different order than you want. Before you suggest anyone this solution point out it weaknesses. **[More info](https://www.simple-talk.com/sql/learn-sql-server/robyn-pages-sql-server-cursor-workbench/)** – Lukasz Szozda Jan 13 '16 at 14:53
The results are correct as you can see in the demo I found several answers in StackOverflow that guarantee "Quirky Updates" as official answer as in this example: http://stackoverflow.com/questions/1153879/how-do-i-calculate-a-running-total-in-sql-without-using-a-cursor – Atheer Mostafa Jan 13 '16 at 15:36
**There is no such thing as ordered table.** If you quote other accepted answer read it entirely. Last comment: http://stackoverflow.com/a/11313533/26167 leads to **`Quirky update - do this at your own risk: The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it.`** If you suggest methods that are not stable add warning. – Lukasz Szozda Jan 13 '16 at 16:46
Also correct result does not proof anything(just try with bigger sample with parallelism). This is the same type of reasoning like people that says they don't need to use `ORDER BY` because data is read based on clustered index(which is obviously not true). – Lukasz Szozda Jan 13 '16 at 16:52
No because it is not correct as I write in my comment! – Atheer Mostafa Jan 13 '16 at 19:30

SQL - Update rows between two values in a column

3 Answers3