1

IT IS NOT THE SAME QUESTION AS : Using LIMIT within GROUP BY to get N results per group?

but i admit it is similar.

I need to select the first 2 rows per person. the rows are ordered by Year received

Problem : there is a possibility than 2 data were entered the same month (Date is entered YYYY-MM)

The query I came with (following the referred question) is stuck in an BIG loop.

SELECT *
FROM `table_data` as b
WHERE (
    SELECT count(*) FROM `table_data` as a
        WHERE a.Nom = b.Nom and a.year < b.year
    ) <= 2;

Sample Data :

  A  |   year   |  Nom
---------------------
  b  |  2011-01 | Tim
---------------------
  d  |  2011-01 | Tim
---------------------
  s  |  2011-01 | Tim
---------------------
  a  |  2011-03 | Luc
---------------------
  g  |  2011-01 | Luc
---------------------
  s  |  2011-01 | Luc

Should export :

  A  |   year   |  Nom
---------------------
  b  |  2011-01 | Tim
---------------------
  d  |  2011-01 | Tim
---------------------
  a  |  2011-03 | Luc
---------------------
  g  |  2011-01 | Luc
Dharman
  • 30,962
  • 25
  • 85
  • 135
Charles Forest
  • 1,035
  • 1
  • 12
  • 30

1 Answers1

1
(
 -- First get a set of results as if you only wanted the latest entry for each
 -- name - a simple GROUP BY from a derived table with an ORDER BY
  SELECT *
  FROM (
    SELECT *
    FROM `table_data`
    ORDER BY `year` DESC
  ) `a`
  GROUP BY `Nom`
)
UNION
(
 -- Next union it with the set of result you get if you apply the same criteria
 -- and additionally specify that you do not want any of the rows found by the
 -- first operation
  SELECT *
  FROM (
    SELECT *
    FROM `table_data`
    WHERE `id` NOT IN (
      SELECT `id`
      FROM (
        SELECT *
        FROM `table_data`
        ORDER BY `year` DESC
      ) `a`
      GROUP BY `Nom`
    )
    ORDER BY `year` DESC
  ) `b`
  GROUP BY `Nom`
)
 -- Optionally apply ordering to the final results
ORDER BY `Nom` DESC, `year` DESC

I feel sure there is a shorter way of doing it but right now I can't for the life of me work out what it is. That does work, though - assuming you have a primary key (which you should) and that it is called id.

DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • since the table got a lot of records, i cannot use that... (sorry for delay), it takes way too lons :S – Charles Forest Jun 22 '12 at 18:11
  • i upvoted still, it did worked (took 15 minutes to process, but it did what i asked) – Charles Forest Jun 22 '12 at 18:52
  • @CharlesForest Bill Karwin's solution [here](http://stackoverflow.com/questions/1442527/how-to-select-the-newest-four-items-per-category/1442867#1442867) does [work](http://sqlfiddle.com/#!2/d4ade/5) and do what you want, but it falls down if a higher primary key does not mean a newer data row. Can you show your full table and a larger, real data sample? – DaveRandom Jun 22 '12 at 19:37
  • i can't since it got some personnal informations... i would if i could - thanks for the pointer, but since data comes from an uncontrolled source, i fear i will be forced to go via php to solve my issue... – Charles Forest Jun 22 '12 at 19:58
  • ±10k right now, but it is going to be much bigger over the years. – Charles Forest Jun 22 '12 at 20:19
  • Well the important point is, does a larger `id` (PK) mean a newer date, or not? I'm trying to come up with something involving a temp table and `INSERT INTO ... SELECT` Plus if the above query took 15 minutes to process, I would suggest that you indexing needs... adjusting... even then I can't see how it took that long, it's only a few `SELECT`s... – DaveRandom Jun 22 '12 at 20:29
  • id doesn't mean newer result. ---- the delay may be related to poor server performances. i was reading about tmp tables, never used them. – Charles Forest Jun 22 '12 at 20:32
  • Nope, I've looked at this from every angle and I just can't come up with a more efficient way to do it. The root of the problem is that because your primary key is not a a monotonically increasing pseudokey (i.e. higher value means newer row) you need to apply sorting *before* grouping, and SQL does not permit `ORDER BY` before `GROUP BY`. This means that you need to create a temporary or derived table with the correct ordering applied before running the actual select/group operation - which puts straight use of the `LEFT OUTER JOIN` approach out immediately. – DaveRandom Jun 22 '12 at 22:30
  • You can do it *slightly* quicker (probably) by creating a temp table (see [this](http://sqlfiddle.com/#!2/3be54/1)) but really what you need to do is adjust the way you store the data in the table so the rows are numerically keyed in date order. This doesn't have to be the primary key, you could add a unique key column to keep track of it, but you would still have to calculate the row's relative position at insert time, and I can't really advise you further without inspecting you application more deeply. – DaveRandom Jun 22 '12 at 22:33