1

I have a little problem here that I'll try to explain. I want to fetch the latest (highest id) maximum 10 rows (can exist more/less than 10) per each unique id in a table.

So if I'm interested in knowing the latest 10 rows from id "1" and "2" I would set the LIMIT to 20 (2 * 10).

This is what my current query looks like right now (which incorrectly will fetch latest 20 rows of the first id since it exists more than 10 rows for that id).

SELECT positions.id
     , trackedpersons.name
     , trackedpersons.id
     , events.name
     , events.route
     , positions.latitude
     , positions.longitude
     , positions.datetime 
  FROM trackedpersons
     , positions
     , events 
 WHERE trackedpersons.id IN (1,2) 
   AND events.id = 1 
   AND events.id = positions.eventid 
   AND positions.trackedpersonid = trackedpersons.id 
 ORDER 
    BY trackedpersons.id
     , positions.id DESC 
 LIMIT 20;
Strawberry
  • 33,750
  • 13
  • 40
  • 57
Peter Warbo
  • 11,136
  • 14
  • 98
  • 193
  • Do not use implicit (comma-) join syntax. Instead, always use explicit JOIN syntax. Aside from that, this is a Top-N question (of the sort that gets asked and answered here every single day) – Strawberry May 07 '14 at 12:35
  • 1
    @Strawberry i mostly use implicit joins, what the difference? – user2009750 May 07 '14 at 12:36
  • even i too want to know :) – Ravi Dhoriya ツ May 07 '14 at 12:37
  • @Strawberry how much helps your suggestion for Peter? – flaschenpost May 07 '14 at 12:37
  • 2
    MySQL evaluates joins in a different order depending upon whether they're implicit or explicit. If they're all implicit, this won't matter (in the same way that 2*3*6 = 6*2*3), but when combined with OUTER JOINs (which cannot be expressed implicitly), you'll start to get errors or unexpected results. Plus, it's harder to read. – Strawberry May 07 '14 at 12:39
  • 1
    @flaschenpost It's a comment, not an answer. – Strawberry May 07 '14 at 12:39
  • why not use union first query with condition `WHERE trackedpersons.id = 1 ... order by ... limit 10` and then 2nd query `WHERE trackedpersons.id = 2 ... order by ... limit 10` – Abhik Chakraborty May 07 '14 at 12:40
  • @AbhikChakraborty Does that solution scale well? – Strawberry May 07 '14 at 12:41
  • no it will not and getting 2 different set of data with condition and then getting subset from both condition is always expensive and I would be interested to know if there is a better solution. – Abhik Chakraborty May 07 '14 at 12:42
  • @AbhikChakraborty, that good solution I'm using since long back, But the problem would be when there are more number of groups that just `two` it would make complicated query, see [this](http://stackoverflow.com/questions/12113699/get-top-n-records-for-each-group-of-grouped-results) – Ravi Dhoriya ツ May 07 '14 at 12:43
  • check this out http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ for selecting the top N rows from each group – Viswanath Polaki May 07 '14 at 12:43
  • @AbhikChakraborty There is. As I say, this question has been answered many, many times here on SO and elsewhere, so I won't bore you by repeating it here. – Strawberry May 07 '14 at 12:44
  • well I gave an answer yesterday but its a different situation where explicit join could be made and then group the data in order http://stackoverflow.com/questions/23501312/load-three-posts-by-user/23501484#23501484 but here there is an explicit condition `WHERE trackedpersons.id IN (1,2) ` and thats the reason I would like to know how it could be optimized. – Abhik Chakraborty May 07 '14 at 12:46
  • I'm using implcit join because this is a dynamic generated query in Java. I can not now beforehand what (and how many) id's to fetch the results for... does that make sense? – Peter Warbo May 07 '14 at 12:52
  • The second part makes sense. The first part does not make sense. – Strawberry May 07 '14 at 13:48
  • @Strawberry can you give me a hint how this query can be rewritten, you seem to know the solution... – Peter Warbo May 07 '14 at 14:21

1 Answers1

0

Very well; here's one way - although it's not the most efficient...

Consider the following...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table 
(id INT NOT NULL
,dt INT NOT NULL
,PRIMARY KEY(id,dt)
);

INSERT INTO my_table VALUES
(101,1),
(101,2),
(101,3),
(101,4),
(102,1),
(102,2),
(102,3),
(102,4),
(102,5),
(103,1),
(103,2),
(103,3),
(104,1),
(104,2),
(105,1);

SELECT * FROM my_table;
+-----+----+
| id  | dt |
+-----+----+
| 101 |  1 |
| 101 |  2 |
| 101 |  3 |
| 101 |  4 |
| 102 |  1 |
| 102 |  2 |
| 102 |  3 |
| 102 |  4 |
| 102 |  5 |
| 103 |  1 |
| 103 |  2 |
| 103 |  3 |
| 104 |  1 |
| 104 |  2 |
| 105 |  1 |
+-----+----+

To discover the position (rank) of each row within its group, we can do this...

SELECT x.*
     , COUNT(*) rank 
  FROM my_table x 
  JOIN my_table y 
    ON y.id = x.id 
   AND y.dt >= x.dt 
 GROUP 
    BY x.id,x.dt 
 ORDER 
    BY id
     , rank;
+-----+----+------+
| id  | dt | rank |
+-----+----+------+
| 101 |  4 |    1 |
| 101 |  3 |    2 |
| 101 |  2 |    3 |
| 101 |  1 |    4 |
| 102 |  5 |    1 |
| 102 |  4 |    2 |
| 102 |  3 |    3 |
| 102 |  2 |    4 |
| 102 |  1 |    5 |
| 103 |  3 |    1 |
| 103 |  2 |    2 |
| 103 |  1 |    3 |
| 104 |  2 |    1 |
| 104 |  1 |    2 |
| 105 |  1 |    1 |
+-----+----+------+

...which can be rewritten thus (obtaining the top 3 from each id)...

SELECT x.*  
  FROM my_table x 
  JOIN my_table y  
    ON y.id = x.id 
   AND y.dt >= x.dt 
 GROUP 
    BY x.id
     , x.dt 
HAVING COUNT(*) <= 3;
+-----+----+
| id  | dt |
+-----+----+
| 101 |  2 |
| 101 |  3 |
| 101 |  4 |
| 102 |  3 |
| 102 |  4 |
| 102 |  5 |
| 103 |  1 |
| 103 |  2 |
| 103 |  3 |
| 104 |  1 |
| 104 |  2 |
| 105 |  1 |
+-----+----+
Strawberry
  • 33,750
  • 13
  • 40
  • 57