3

I've got a table with repeating IDs which I will fix later. Basically I want to return all the rows where the ID is distinct, but I want the ENTIRE row. Something like:

select * from table group by ID

select * from table where (ID is not repeated)

In this case, they are identical rows, so I don't care if it's First or Last, Min or Max.

Note that I DON"T want to do this:

select MIN(col1), MIN(col2), ... from table group by ID

I want a way to get this result without enumerating every column.

Edit: I'm using SQL Server 2008 R2.

Ehryk
  • 1,930
  • 2
  • 27
  • 47

4 Answers4

2

If you are using MySql, do this:

select 
    *
from tbl
group by ID

MySQL live test: http://www.sqlfiddle.com/#!2/8c7fd/2

If you are using Postgresql, do this:

select distinct on(id)
    *
from tbl
order by id

If you want Postgresql DISTINCT ON be at least as predictable as CTE windowing function. Sort another column:

select distinct on(id)
    *
from tbl
order by id
   , someColumnHere -- Choose ASC for first row, DESC for last row

Postgresql live test: http://www.sqlfiddle.com/#!1/8c7fd/1

If you are using CTE windowing-capable database (e.g. Postgres, Oracle, Sql Server), use this:

with ranked as
(
  select 
      rank() over(partition by id order by column) rn,
      *
  from tbl
)
select * from ranked where rn = 1

CTE windowing-capable databases:

Posgtresql: http://www.sqlfiddle.com/#!1/8c7fd/2

Oracle: http://www.sqlfiddle.com/#!4/b5cf9/1

Sql Server: http://www.sqlfiddle.com/#!3/8c7fd/3

Michael Buen
  • 38,643
  • 9
  • 94
  • 118
  • 1
    Unless Id is the only column, this won't work without supplying an aggregate function for each column – kroehre May 03 '12 at 23:42
  • Who said so that aggregate function is a necessity on MySql if you want to do a GROUP BY? See the result here: http://www.sqlfiddle.com/#!2/8c7fd/1 – Michael Buen May 03 '12 at 23:48
  • Op didn't specify MySql, and it is a necessity in other databases. I appreciate the lesson, but I was just trying to help. – kroehre May 03 '12 at 23:50
  • It's ok dude ;-) Really, MySQL violates many database fundamentals, that's why you presumed my first query is incorrect, yeah it is incorrect on virtually all database platform; however, it's correct in MySQL. See an example of what a good GROUP BY should be like and strive of doing so: http://www.ienablemuch.com/2010/08/postgresql-recognizing-functional.html – Michael Buen May 03 '12 at 23:59
  • I'm using SQL Server, so this isn't an option. It's good to know that I can do this in MySQL though, thanks. – Ehryk May 04 '12 at 00:53
  • you can also use row_number, it's similar with rank; I have yet to see where they do differs(though I think it has). rank is more semantic than row_number. it's like the difference between `` and `` tag :-) – Michael Buen May 04 '12 at 00:56
2

Since you didn't state in your question what database are you using, I suggest you make a query that will work on all database platforms. But this query requires you to make a new column with a property of auto_number,identity,serial,etc

This would be the query:

select * from tbl 
where (id,auto_number_here) in
   (select id, min(auto_number_here) 
    from tbl 
    group by id)

That will work on many platforms, except Sql Server. Sql Server is not tuple-capable. You have to do this:

select * from tbl x
where 
   -- leave this comment, so it mimics the tuple capability
   -- (id,auto_number_here) in
   EXISTS
   (select
       -- Replace this:  
       -- id, min(auto_number_here) 

       -- With whatever floats your boat, 
       -- you can use 1, null(the value generated by Entity Framework's EXIST clause), 
       -- even 1/0 is ok :-) (this will not result to divide-by-zero error)

       -- But I prefer retaining the columns, so it mimics the tuple-capable database:
       id, min(auto_number_here) 

    from tbl 
    where id = x.id 
    group by id
    having x.auto_number_here = min(auto_number_here))

Tuple-related question: using tuples in sql in clause

Since some database doesn't support tuple, you can simulate it instead

select z.* from tbl z
join (select id, min(auto_number_here) as first_row from tbl group by id) as x
on z.id = x.id and z.auto_number_here = x.first_row

It's a bit better than EXISTS approach. But if your database supports tuple use it instead; as much as possible, use JOIN for reflecting table relationships only, and use WHERE clause for filtering.


UPDATE

Perhaps a concrete example could explain it clearly, suppose we have existing table that we forgot to put a primary key on:

create table tbl(
  id varchar(5), -- supposedly primary key 
  data int,
  message varchar(100) 
);


insert into tbl values
('A',1,'the'),
('A',1,'quick'),
('A',4,'brown'),
('B',2, 'fox'),
('B',5, 'jumps'),
('B',5, 'over'),
('C',6, 'the'),
('C',7, 'lazy');

In order to pick up only one row out of duplicates, we need to add a third column on existing data.

This will aid us on picking up one and only one row out of duplicates

alter table tbl add auto_number_here int identity(1,1) not null;

This shall work now:

select z.* from tbl z
join (select id, min(auto_number_here) as first_row from tbl group by id) as x
on z.id = x.id and z.auto_number_here = x.first_row

Live test: http://www.sqlfiddle.com/#!6/19b55/3

And so is this:

select * from tbl x
where 
   -- leave this comment, so it mimics the tuple capability
   -- (id,auto_number_here) in
   EXISTS
   (
     select
       -- Replace this:  
       -- id, min(auto_number_here) 

       -- With whatever floats your boat, 
       -- you can use 1, null(the value generated by Entity Framework's EXIST clause), 
       -- even 1/0 is ok :-) (this will not result to divide-by-zero error)

       -- But I prefer retaining the columns, so it mimics the tuple-capable database:
       id, min(auto_number_here) 

    from tbl 
    where id = x.id 
    group by id
    having x.auto_number_here = min(auto_number_here)

   )

Live test: http://www.sqlfiddle.com/#!6/19b55/4

Community
  • 1
  • 1
Michael Buen
  • 38,643
  • 9
  • 94
  • 118
  • What is auto_number_here? I'm not sure I'm understanding that part. – Ehryk May 04 '12 at 03:04
  • auto_numbered column. it's like Sql Server's identity – Michael Buen May 04 '12 at 03:05
  • 942 rows, just like what I'm getting with a select *. There's only 880 unique IDs. – Ehryk May 04 '12 at 03:07
  • Have you created a unique number for the entire rows. Each row should have their own unique number. You can use identity(1,1) for that column. Then the above query will work. Use a column that has no repeating values (ideally from identity(sql server),auto_number(mysql),serial(postgresql)) for the `auto_number_here` on your `MIN(auto_number_here)` expression – Michael Buen May 04 '12 at 03:30
  • Don't use a column with duplicate values on that `MIN` aggregate – Michael Buen May 04 '12 at 03:32
  • I gave a new example on my answer now. Check its sqlfiddle too – Michael Buen May 04 '12 at 03:42
  • If your query involves multiple one-to-many tables, perhaps you should flatten out your query results first before your query pick up one row out of it. See example of query results flattening http://www.anicehumble.com/2012/05/sql-count-computer-program-does-what.html – Michael Buen May 04 '12 at 03:49
1

Try this out. It might work because all columns in the two rows are identical.

select distinct *
from table
SQLCurious
  • 514
  • 3
  • 10
  • That worked somewhat. 880 results in the table, 942 in the view. I only want the 880 (the view is doing some joins that are adding unwanted rows, and I don't have time to rewrite them right now). That command returned 899, adding 19 results - so apparently they aren't exactly identical - though the ID columns are. – Ehryk May 03 '12 at 23:43
  • In that case, you'll have to choose between the two rows, writing out Min(Col1), ... After all, SQL doesn't know how to decide which of two slightly different rows to return. – SQLCurious May 03 '12 at 23:47
1

Use a subquery to get your unique ids, then use that to filter the results:

SELECT *
FROM YourTable t,
INNER JOIN (
  SELECT Id, COUNT(*) 'count'
  FROM YourTable
  GROUP BY Id
) sq ON sq.Id = t.Id
WHERE sq.count = 1
kroehre
  • 1,104
  • 5
  • 15
  • The second subquery is a bit overkill. You could just do `SELECT sqq.Id, COUNT(1) 'count' FROM YourTable sqq GROUP BY sqq.Id` – sazh May 04 '12 at 00:23
  • @diaho good point, not sure what I was thinking there. Thanks! – kroehre May 04 '12 at 00:26
  • Both of these return 831 rows, where there are 880 unique rows. Essentially I do want the ones with count > 1, but only the first one. – Ehryk May 04 '12 at 00:51
  • Suggesting `COUNT(1)` for `COUNT(*)` is a [Cargo Cult Programming](http://en.wikipedia.org/wiki/Cargo_cult_programming) . See this: http://www.ienablemuch.com/2010/04/debunking-myth-that-countdracula-is.html And this: http://www.ienablemuch.com/2010/05/why-is-exists-select-1-cargo-cult.html – Michael Buen May 04 '12 at 01:01
  • what's overkill here is he can move the count filter from WHERE clause to inside of subquery and put it in HAVING. `INNER JOIN(SELECT id FROM YourTable GROUP BY Id HAVING COUNT(*) = 1) sq ON sq.Id = t.Id`. But just the same, this is not what the OP asked, he don't want rows with no duplicates only – Michael Buen May 04 '12 at 01:07
  • reason for downvote? Also @Michael Buen please stick to the question asked instead of shoving a ton of unnecessary content down everyone's throats. It's not productive but it's certainly very annoying. We get it, you are the world's foremost expert on databases. – kroehre May 04 '12 at 19:39