Get first/last n records per group by

Question

I have two tables : tableA (idA, titleA) and tableB (idB, idA, textB) with a one to many relationship between them. For each row in tableA, I want to retrieve the last 5 rows corresponding in tableB (ordered by idB).

I've tried

SELECT * FROM tableA INNER JOIN tableB ON tableA.idA = tableB.idA LIMIT 5

but it's just limiting the global result of INNER JOIN whereas I want to limit the result for each different tableA.id

How can I do that ?

Thanks

For the last 5 per... Is there a basis for last 5 in table B, or just the last 5 based on "idB" which would appear to be an auto-increment sequence column. If date based, what column would that be... — DRapp, Apr 17 '12 at 12:42
Similar questions: http://stackoverflow.com/questions/4688664/mysql-select-n-records-base-on-group-by and http://stackoverflow.com/questions/5319643/top-n-per-group-with-multiple-table-joins and maybe http://stackoverflow.com/q/7539548 - an extension of [this](http://stackoverflow.com/q/8748986) and [this](http://stackoverflow.com/q/1313120) — Tomas, Apr 17 '12 at 13:29
Check the [`[greatest-n-per-group]`](http://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql) tag or the question under **Related**, on the right. — ypercubeᵀᴹ, Apr 17 '12 at 13:40

Tomas · Answer 1 · 2012-04-17T17:08:12.903

5

Much simplified and corrected Carlos solution (his solution would return first 5 rows, not last...):

SELECT tB1.idA, tB1.idB, tB1.textB
FROM tableB as tB1
    JOIN tableB as tB2
        ON tB1.idA = tB2.idA AND tB1.idB <= tB2.idB
GROUP BY tB1.idA, tB1.idB
HAVING COUNT(*) <= 5

In MySQL, you may use tB1.textB even if it is group by query, because you are grouping by the idB in the first table, so there is only single value of tB1.textB for each group...

edited Apr 17 '12 at 17:08

answered Apr 17 '12 at 16:49

Tomas

57,621
49
238
373

Just a note, just not to forget it: it is still question which solution would peform better - the join solution is in principle quadratic, while the rank solution is linear. That said, it is quite likely that the SQL optimizer will make the quadratic problem down to linear, while no more optimizations can be expected from the ranking solution. The same applies to solutions of [this similar question](http://stackoverflow.com/q/8748986). – Tomas Apr 17 '12 at 17:17
My general experience in MySQL is, if you can rewrite the subquery / ranking query to join, it will be faster.. – Tomas Apr 17 '12 at 17:18

score 2 · Accepted Answer · edited Apr 17 '12 at 16:34

2

I think this is what you need:

SELECT tableA.idA, tableA.titleA, temp.idB, temp.textB
FROM tableA
INNER JOIN
(
    SELECT tB1.idB, tB2.idA,
    (
        SELECT textB
        FROM tableB
        WHERE tableB.idB = tB1.idB
    ) as textB
    FROM tableB as tB1
        JOIN tableB as tB2
            ON tB1.idA = tB2.idA AND tB1.idB >= tB2.idB
    GROUP BY tB1.idA, tB1.idB
    HAVING COUNT(*) <= 5
    ORDER BY idA, idB
) as temp
ON tableA.idA = temp.idA

More info about this method here:

http://www.sql-ex.ru/help/select16.php

edited Apr 17 '12 at 16:34

Tomas

57,621
49
238
373

answered Apr 17 '12 at 13:25

Karol

7,803
9
49
67

1

Why do you have the 3rd nested select (`select textB from tableB where tableB.idB = tB1.idB`)?? This subquery can be replaced just with `tB1.textB`! – Tomas Apr 17 '12 at 16:32
and replace `inner join` with `join`. Don't make it a bigger mystery than it is :-) – Tomas Apr 17 '12 at 16:33
2

@user1336526, I've simplified and corrected this Carlos' solution, see my answer. – Tomas Apr 17 '12 at 16:50
if can follow this article , it is working on good on millions of records – Ali Abbas Sep 27 '17 at 13:22

DRapp · Answer 3 · 2012-04-17T13:26:51.690

Ensure your "B" table has an index on ( idA, idB ) for optimized order by purposes so for each "A" ID, it can quickly have the "B" order descending thus putting the newest to the top PER EACH "A" ID. Using the MySQL variables, every time the "A" ID changes, it resets the rank back to 1 for the next "A" id.

select 
      B.idA,
      B.idB,
      B.textB
      @RankSeq := if( @LastAGroup = B.idA, @RankSeq +1, 1 ) ARankSeq,
      @LastAGroup := B.idA as ignoreIt
   from
      tableB B
         JOIN tableA A
            on B.idA = A.idA,
      (select @RankSeq := 0, @LastAGroup := 0 ) SQLVars 
   having
      ARankSeq <= 5
   order by
      B.idA,
      B.idB DESC

Ugh! Nasty query :-) Look at [much simpler variant](http://stackoverflow.com/a/4688699/684229). Anyway there is a big disadvantage in performance - MySQL must walk through all the rows. — Tomas, Apr 17 '12 at 17:07

score -1 · Answer 4 · edited Apr 18 '12 at 23:17

-1

select * from tablea ta, tableb tb
where ta.ida=tb.idb and tb.idb in 
(select top 5 idb from tableB order by idb asc/desc)

(asc if you want lower ids desc if you want higher ids)
less complicated and easy to include more conditions
if top clause is not present in mysql use limit clause (I don't have much knowledge abt mysql)

edited Apr 18 '12 at 23:17

SamStar

335
2
17

answered Apr 17 '12 at 13:50

Amogh Rai

159
1
7

He's asking for the top 5 in tableB for each row in tableA. This just restricts tableB to its top 5 rows, ordered by idb. Also you need to use LIMIT for MySQL, not TOP. – Aaron Apr 17 '12 at 21:18
thank you for your comments i realise my mistake......will this work ....select * from tablea ta, tableb tb where ta.ida=tb.idb and tb.idb in (select idb from tableB,tableA tableB.idb=tableA.ida order by idb asc/desc limit 5).. – Amogh Rai Apr 18 '12 at 06:30

Get first/last n records per group by

4 Answers4

Linked