SQL Case statement to return index of duplicate rows

Question

I'm running a query to return multiple rows of employee hours data from multiple tables. There is a valid condition of employees putting a day shift of 8 hours per day to multiple customers for billing purposes. I want to update the "Index" column with a unique index for any duplicate across 3 columns

eg.

Emp   Hrs   Date   Index
Fred  8     11/7   1
Fred  8     11/7   2
Fred  8     11/7   3
Fred  10    12/7   1
Fred  9     13/7   1

In the Select statement these is a Case statement which returns a value in the Index column for a duplicate but it doesn't give a unique value

case when count(*) over (partition by Emp, Hrs, Date) > 1
    then 1
        else 0
    end

can I include a group by condition in a case statement to achieve this?

Why do you need to update the table with a value you can always determine at query time? Do you like having to perform this maintenance every time the table is touched? I would put this calculation in a view, not as a permanent column in the table. — Aaron Bertrand, Feb 15 '17 at 05:41

score 5 · Accepted Answer · edited May 23 '17 at 11:45

5

Try this:

UPDATE x
SET x.Index = x.Index_Calc
FROM
(
    SELECT Index, ROW_NUMBER() OVER (PARTITION BY Emp, Hrs, Date ORDER BY (SELECT 1)) AS Index_Calc
    FROM yourTable
) x

I based my answer off this Stack Overflow question, but I believe it should work here as well.

edited May 23 '17 at 11:45

Community

1
1

answered Feb 15 '17 at 04:48

Tim Biegeleisen

502,043
27
286
360

Do you mean `partition BY name, Hrs, d order by (select 1)`, order by is required for row_number(), isn't it? – artm Feb 15 '17 at 05:17
@artm You're right, in this case we can use `ORDER BY (SELECT 1)` to avoid any ordering. In the case of the OP, since there are records which appear completely identical, we don't care how SQL Server will choose to order internally. – Tim Biegeleisen Feb 15 '17 at 05:21
Thanks guys for the responses and assistance, as you can see I'm new to the forum and on my way to becoming dangerous with SQL. The table is built from multiple tables from a linked server and we probably should have Views but that is the way we've gone to better integrate with Power BI. Always learning!:) – Dave Feb 15 '17 at 12:01

score 1 · Answer 2 · answered Feb 15 '17 at 04:44

You can achieve this with row_number() and partition by.

Partition will give you a row number that resets every time a new "group" of values is encountered.

create table billings (employee varchar(100), hours decimal(10,2), billed_on date)
go

insert into billings values
('Mike', 8, '2017-02-01'),
('Mike', 8, '2017-02-01'),
('Mike', 8, '2017-02-01'),
('Mike', 8, '2017-02-19'),
('John', 10, '2017-02-01'),
('John', 8, '2017-02-01');

select  employee, hours, billed_on,
        row_number() over (partition by employee, billed_on order by employee) as ix
from    billings
order by billed_on, employee, ix

Produces:

    employee  hours billed_on    ix
1   John      10,00 01.02.2017   1
2   John       8,00 01.02.2017   2
3   Mike       8,00 01.02.2017   1
4   Mike       8,00 01.02.2017   2
5   Mike       8,00 01.02.2017   3
6   Mike       8,00 19.02.2017   1

The original problem asked for an _update_, not a select. – Tim Biegeleisen Feb 15 '17 at 04:46 — Tim Biegeleisen, Feb 15 '17 at 04:46

SQL Case statement to return index of duplicate rows

2 Answers2