2

I am trying to select the rows from a table by 'group by' and ignoring the first row got by sorting the data by date. The sorting should be done by a date field, to ignore the newest entry and returning the old ones for the group.

The table looks like

+----+------------+-------------+-----------+
| id | updated on | group_name  | list_name |
+----+------------+----------------+--------+
| 1  | 2013-04-03 | g1          | l1        |
| 2  | 2013-03-21 | g2          | l1        |
| 3  | 2013-02-26 | g2          | l1        |
| 4  | 2013-02-21 | g1          | l1        |
| 5  | 2013-02-20 | g1          | l1        |
| 6  | 2013-01-09 | g2          | l2        |
| 7  | 2013-01-10 | g2          | l2        |
| 8  | 2012-12-11 | g1          | l1        |
+----+------------+-------------+-----------+

http://www.sqlfiddle.com/#!2/cec99/1

So, basically, I just want to return ids (3,4,5,6,8) as those are the oldest in the group_name and list_name. Ignoring the latest entry and returning the old ones by grouping it based on group_name and list_name

I am not able to write sql for this problem. I know order by will not work with group by. Please help me in figuring out a solution.

Thanks

And also, is there a way to do this without using subqueries?

user2436575
  • 31
  • 1
  • 4

3 Answers3

2

Something like the following to get only the rows that are the minimum date for a specific row:

select a.ID, a.updated_on, a.group_name, list_name
from data a 
where
a.updated_on < 
(
select max(updated_on)
from data 
group by group_name having group_name = a.group_name
);

SQL Fiddle: http://www.sqlfiddle.com/#!2/00d43/10

Update (based on your reqs)

select a.ID, a.updated_on, a.group_name, list_name
from data a 
where
a.updated_on < 
(
select max(updated_on)
from data 
group by group_name, list_name having group_name = a.group_name
  and list_name = a.list_name
);

See: http://www.sqlfiddle.com/#!2/cec99/3

Update (To not use Correlated Subquery but Simple subquery)

Decided correlated subquery is too slow based on: Subqueries vs joins

So I changed to joining with a aliased temporary table based on nested query.

select a.ID, a.updated_on, a.group_name, a.list_name
from data a,
(
select group_name, list_name , max(updated_on) as MAX_DATE
from data 
group by group_name, list_name 
) as MAXDATE   
where
a.list_name = MAXDATE.list_name AND
a.group_name = MAXDATE.group_name AND
a.updated_on < MAXDATE.MAX_DATE
;

SQL Fiddle: http://www.sqlfiddle.com/#!2/5df64/8

Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • But I need all the entries of the group except the newest one – user2436575 May 30 '13 at 13:06
  • 1
    Oh ok...just a small edit is needed to play with the max instead :) There you go... smaller than the max aka newest entry... – Menelaos May 30 '13 at 13:07
  • Updated sqlfiddle with new data. Thanks for your answer. But its not what i expected. the return set should be (3,4,5,6,8). As id-7 is the newest in the group g2 & l2 – user2436575 May 30 '13 at 13:14
  • @user2436575 give me your link for the SQL Fiddle... I thought you wanted to ignore the newest entry PER group_name... you want to ignore the newest entry overall? – Menelaos May 30 '13 at 13:18
  • http://www.sqlfiddle.com/#!2/cec99/1 forgot to post link. I want to ignore the newest entry made for both group_name and list_name. should return only (3,4,5,6,8) in this case – user2436575 May 30 '13 at 13:24
  • See: http://www.sqlfiddle.com/#!2/cec99/3 . Please update your question to reflect your reqs better. – Menelaos May 30 '13 at 13:30
  • Ok thanks. is there a way to do this without using sub queries? because this table has 10 million records – user2436575 May 30 '13 at 13:33
  • @user2436575, could you run both queries on your data and post the results? – Menelaos May 31 '13 at 11:03
0

You could try using the following query (yes, it has a nested join, but maybe it helps).

SELECT ID FROM 
(select d1.ID FROM data d1 LEFT JOIN 
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND 
d1.updated_on > d2.updated_on) WHERE d2.ID IS NULL) data_tmp;

CORRECTION:

SELECT DISTINCT(ID) FROM 
(select d1.* FROM data d1 LEFT JOIN 
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND 
d1.updated_on < d2.updated_on) WHERE d2.ID IS NOT NULL) date_tmp;
David
  • 147
  • 4
  • Did you check what results it returns? According to: http://www.sqlfiddle.com/#!2/cec99/10 : {3,6,8} – Menelaos May 30 '13 at 15:25
  • I ran your query over 10,000 elements... there is a performance issue related to the unrary join.. 10,000 X 10,000 = 100,000,000 rows before where filter... :( – Menelaos May 31 '13 at 11:03
0
SELECT DISTINCT y.id 
  FROM data x 
  JOIN data y 
    ON y.group_name = x.group_name 
   AND y.list_name = x.list_name 
   AND y.updated_on < x.updated_on;
Strawberry
  • 33,750
  • 13
  • 40
  • 57