Alternative way of using DISTINCT in SQL Server 2008

Question

I have a table with millions of rows of data in SQL Server 2008. I am trying to find an alternative way instead of using distinct. Please see the query below:

create table #temp (id int)
create table #temp2 (id int, name varchar(55), t_id int)

insert into #temp values (1)
insert into #temp2 values (1,'john',1)
insert into #temp2 values (2,'alex',1)
insert into #temp2 values (3,'alex',1)

select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id

This query returns output like:

Id  Name
1   john
1   alex
1   alex

The expected output is:

Id Name
1  john
1  alex

I can provide the expected output by using DISTINCT keyword, I know it, but it decreases the performance. Could you please advise me some professional alternative ways (except using group by) to handle it? Thanks!

Edit: I have a custom concentenate function which helps me to do:

select t.id, concetenate(t2.name)
from #temp t
inner join #temp2 t2 on t.id = t2.t_id

and this is returning 1 john,alex,alex. I am looking for a way to get rid of one of the alex without updating the function and do not want to use "distinct" keyword.

You could use `ROW_NUMBER()` within a CTE and partition over your columns. Then just return rows with a value of 1. — Thom A, Feb 12 '18 at 16:57
I'm going to guess there's nearly 100.01% chance at an XY problem here. — Jacob H, Feb 12 '18 at 16:57
You have given no sense of what your data really looks like and `select distinct` would be such minor overhead on the example data that it would not be noticeable. — Gordon Linoff, Feb 12 '18 at 16:58
[What's faster group by or distinct](https://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql) — Jacob H, Feb 12 '18 at 16:58
OP how about you explain the problem instead of assuming the answer will be "something faster than distinct or group by" because you're probably having index problems or something completely unrelated to the keyword in your query. — Jacob H, Feb 12 '18 at 17:00
But you have three rows with t.id = t2.t_id = 1. Why would you *expect* anything else? Unfortunate to not want to use distinct to do exactly what it is intended to do. Are you really having performance issues with distinct? — paparazzo, Feb 12 '18 at 17:48

Mazhar · Accepted Answer · 2018-02-13T08:26:21.043

0

Use GROUP BY

select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
GROUP BY t.id, t2.name

Use a CTE and ROW_NUMBER and your custom "concetenate" function

;WITH cte
AS(
    select t.id, t2.name, RN=ROW_NUMBER()OVER(PARTITION BY t2.name ORDER BY t2.id)
    from #temp t
    inner join #temp2 t2 on t.id = t2.t_id
)
SELECT C.id
     , Name =concetenate(C.name)
FROM cte C WHERE C.RN = 1

edited Feb 13 '18 at 08:26

answered Feb 12 '18 at 16:58

Mazhar

3,797
1
12
29

Oh thanks, forgot to tell in the question "Except group by", my bad. – Emel Uras Feb 12 '18 at 16:59
@EmelUras Updated the answer with an alternative solution – Mazhar Feb 12 '18 at 17:03

score 0 · Answer 2 · answered Feb 12 '18 at 17:00

You can use group by like below - But why you are inserting duplicates in yourtable..

create table #temp (id int)
create table #temp2 (id int, name varchar(55), t_id int)

insert into #temp values (1)
insert into #temp2 values (1,'john',1)
insert into #temp2 values (2,'alex',1)
insert into #temp2 values (3,'alex',1)

select t.id, t2.name
from #temp t
inner join #temp2 t2 on t.id = t2.t_id
GROUP BY t.id, t2.name

Another solution is we can create a constraint to restrict the duplicate values.

Alternative way of using DISTINCT in SQL Server 2008

2 Answers2