1308

There is a table messages that contains data as shown below:

Id   Name   Other_Columns
-------------------------
1    A       A_data_1
2    A       A_data_2
3    A       A_data_3
4    B       B_data_1
5    B       B_data_2
6    C       C_data_1

If I run a query select * from messages group by name, I will get the result as:

1    A       A_data_1
4    B       B_data_1
6    C       C_data_1

What query will return the following result?

3    A       A_data_3
5    B       B_data_2
6    C       C_data_1

That is, the last record in each group should be returned.

At present, this is the query that I use:

SELECT
  *
FROM (SELECT
  *
FROM messages
ORDER BY id DESC) AS x
GROUP BY name

But this looks highly inefficient. Any other ways to achieve the same result?

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Vijay Dev
  • 26,966
  • 21
  • 76
  • 96
  • 4
    see accepted answer in http://stackoverflow.com/questions/1379565/mysql-first-and-last-record-of-a-grouped-record-aggregate-functions for a more efficient solution – o17t H1H' S'k Jun 25 '12 at 12:45
  • 2
    Duplicate of http://stackoverflow.com/q/121387/684229 – Tomas Jun 14 '13 at 20:10
  • 12
    Why can't you just add DESC, i.e. select * from messages group by name DESC – DatsunBing Dec 03 '15 at 06:41
  • Possible duplicate of [How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?](http://stackoverflow.com/questions/612231/how-can-i-select-rows-with-maxcolumn-value-distinct-by-another-column-in-sql) – Ciro Santilli OurBigBook.com Jun 12 '16 at 22:19
  • 5
    @KimPrince It seems like the answer you are suggesting doesn't do what is expected! I just tried your method and it took FIRST row for each group and ordered DESC. It does NOT take the last row of each group – Ayrat May 22 '17 at 15:34
  • For more efficiency, see http://mysql.rjweb.org/doc.php/groupwise_max – Rick James Nov 11 '17 at 01:15
  • See This: https://paulund.co.uk/get-last-record-in-each-mysql-group – Milad Ghiravani Oct 28 '18 at 19:02
  • @DatsunBing your solution is the easiest and perfect one. – Jaspal May 23 '21 at 16:51
  • Came up with an interesting solution to this : https://stackoverflow.com/a/68894680/13457138 –  Aug 23 '21 at 21:59
  • "last record" is not reliable if you do not specify an explicit `ORDER BY`. – dolmen Oct 20 '21 at 08:55
  • @DatsunBing - The inner `ORDER BY` is ignored by the Optimizer. The outer `GROUP BY` violates `ONLY_FULL_GROUP_BY`. – Rick James Feb 21 '22 at 18:26
  • Can somebody please tell me why all databases don't use the only (in my opinion) sane result when using an aggregate function, which is to select the other fields *from the same row*? To not do so not only doesn't make sense to me, be necessitates all these other methods which are *way more complicated*. – Michael Jan 10 '23 at 18:56
  • I was just looking for something different question and I found yours & I tried it in my way.. [answer](https://dbfiddle.uk/-L-bNsyq) – Manoj Feb 26 '23 at 07:19

34 Answers34

1361

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:

WITH ranked_messages AS (
  SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
  FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;

This and other approaches to finding groupwise maximal rows are illustrated in the MySQL manual.

Below is the original answer I wrote for this question in 2009:


I write the solution this way:

SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
 ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;

Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.

For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.

I'll write a query to find the most recent post for a given user ID (mine).

First using the technique shown by @Eric with the GROUP BY in a subquery:

SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
            FROM Posts pi GROUP BY pi.owneruserid) p2
  ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;

1 row in set (1 min 17.89 sec)

Even the EXPLAIN analysis takes over 16 seconds:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table      | type   | possible_keys              | key         | key_len | ref          | rows    | Extra       |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                       | NULL        | NULL    | NULL         |   76756 |             | 
|  1 | PRIMARY     | p1         | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY     | 8       | p2.maxpostid |       1 | Using where | 
|  2 | DERIVED     | pi         | index  | NULL                       | OwnerUserId | 8       | NULL         | 1151268 | Using index | 
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)

Now produce the same query result using my technique with LEFT JOIN:

SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
  ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1 row in set (0.28 sec)

The EXPLAIN analysis shows that both tables are able to use their indexes:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys              | key         | key_len | ref   | rows | Extra                                |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ref  | OwnerUserId                | OwnerUserId | 8       | const | 1384 | Using index                          | 
|  1 | SIMPLE      | p2    | ref  | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8       | const | 1384 | Using where; Using index; Not exists | 
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)

Here's the DDL for my Posts table:

CREATE TABLE `posts` (
  `PostId` bigint(20) unsigned NOT NULL auto_increment,
  `PostTypeId` bigint(20) unsigned NOT NULL,
  `AcceptedAnswerId` bigint(20) unsigned default NULL,
  `ParentId` bigint(20) unsigned default NULL,
  `CreationDate` datetime NOT NULL,
  `Score` int(11) NOT NULL default '0',
  `ViewCount` int(11) NOT NULL default '0',
  `Body` text NOT NULL,
  `OwnerUserId` bigint(20) unsigned NOT NULL,
  `OwnerDisplayName` varchar(40) default NULL,
  `LastEditorUserId` bigint(20) unsigned default NULL,
  `LastEditDate` datetime default NULL,
  `LastActivityDate` datetime default NULL,
  `Title` varchar(250) NOT NULL default '',
  `Tags` varchar(150) NOT NULL default '',
  `AnswerCount` int(11) NOT NULL default '0',
  `CommentCount` int(11) NOT NULL default '0',
  `FavoriteCount` int(11) NOT NULL default '0',
  `ClosedDate` datetime default NULL,
  PRIMARY KEY  (`PostId`),
  UNIQUE KEY `PostId` (`PostId`),
  KEY `PostTypeId` (`PostTypeId`),
  KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
  KEY `OwnerUserId` (`OwnerUserId`),
  KEY `LastEditorUserId` (`LastEditorUserId`),
  KEY `ParentId` (`ParentId`),
  CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;

Note to commenters: If you want another benchmark with a different version of MySQL, a different dataset, or different table design, feel free to do it yourself. I have shown the technique above. Stack Overflow is here to show you how to do software development work, not to do all the work for you.

outis
  • 75,655
  • 22
  • 151
  • 221
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 12
    Really? What happens if you have a ton of entries? For example, if you're working w/ an in-house version control, say, and you have a ton of versions per file, that join result would be massive. Have you ever benchmarked the subquery method with this one? I'm pretty curious to know which would win, but not curious enough to not ask you first. – Eric Aug 21 '09 at 18:19
  • 2
    Could you elaborate a bit the purpose of the condition "WHERE p2.postid IS NULL"? Wouldn't it contradict with the other condition "p1.postid < p2.postid"? – Katherine Chen Jul 25 '21 at 14:59
  • 1
    @KatherineChen, it has to do with the way `LEFT [OUTER] JOIN` works. If that join finds no matches for a given row in `m1`, then it will still return that row `m1`, but all the columns of `m2` will be NULL. – Bill Karwin Jul 25 '21 at 17:32
  • 1
    Thank you for the explanation! So this would lead it to match either the standalone record (m1.name = m2.name and no other ids associated with the same name) or the largest id given the same name (m1.name = m2.name and m1.id > m2.id). – Katherine Chen Jul 27 '21 at 04:10
  • 1
    @KatherineChen, I would describe it as: no other row is found with the same `name` and a greater `id`, therefore `m1` must be the row with the greatest `id` for that given value of `name`. – Bill Karwin Jul 27 '21 at 14:54
  • it would be nice if you benchmarked the windows function version too – ysth Jul 30 '21 at 13:02
  • 3
    @ysth I would hope that the point of Stack Overflow is to demonstrate techniques for readers, so they can be empowered to do more work themselves. The goal is not to do _all_ the work for them. – Bill Karwin Jul 30 '21 at 14:38
  • there is an edge case where the filter criteria for two+ records is exactly the same, then it will return multiple records. e.g. if your data looks like `m1.name = m2.name AND m1.id = m2.id` – WiR3D Nov 15 '21 at 12:55
  • @WiR3D, We assume that `id` is a unique key. If it is not, then you're right, you need to use some other column that can differentiate rows with the same `name`. This is why the ROW_NUMBER() window function is so useful, because you might not have such a column. – Bill Karwin Nov 15 '21 at 15:25
  • 2 cents here, but this was significantly faster than the `max` solution. – StevenNunez Dec 17 '21 at 22:41
  • I realize that this post is rather old, but I happened to look at it today. And I don't think that the "Eric" query is optimized properly. The condition should be (IMO) `ON (p1.owneruserid = p2.owneruserid AND p1.postid = p2.maxpostid`. As it is, I think that it has to generate a maximum `postid` for every user. But you only need it for 20860. You're a prolific poster, but I don't believe that you had a million posts in a single month. The 1384 that you did have should be tractable, as it was in your LEFT JOIN. Your version of "Eric" only works at all because one user per post. – mdfst13 Nov 10 '22 at 03:49
  • @mdfst13 - yes, one could put the same WHERE clause into the subquery. Good point. Anyway, I got used to my LEFT JOIN solution many years ago, because it was so long ago, MySQL didn't even support subqueries. – Bill Karwin Nov 10 '22 at 04:26
178

UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.

Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.

I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.

Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).

I have a couple of other solutions that are based on the same ideas:

  • if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;
  • if we read the values which are covered by an index, the values are read in the order of the index;
  • each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.
  • in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)

3 ways MySQL uses indexes is a great article to understand some details.

Solution 1

This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:

SELECT test_id, MAX(request_id) AS request_id
FROM testresults
GROUP BY test_id DESC;

If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:

SELECT test_id, request_id
FROM (
    SELECT test_id, MAX(request_id) AS request_id
    FROM testresults
    GROUP BY test_id DESC) as ids
ORDER BY test_id;

This one takes about 1,2 secs on my data.

Solution 2

Here is another solution that takes about 19 seconds for my table:

SELECT test_id, request_id
FROM testresults, (SELECT @group:=NULL) as init
WHERE IF(IFNULL(@group, -1)=@group:=test_id, 0, 1)
ORDER BY test_id DESC, request_id DESC

It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.

The disadvantage of the query is that its result cannot be cached by the query cache.

Rick James
  • 135,179
  • 13
  • 127
  • 222
newtover
  • 31,286
  • 11
  • 84
  • 89
134

Use your subquery to return the correct grouping, because you're halfway there.

Try this:

select
    a.*
from
    messages a
    inner join 
        (select name, max(id) as maxid from messages group by name) as b on
        a.id = b.maxid

If it's not id you want the max of:

select
    a.*
from
    messages a
    inner join 
        (select name, max(other_col) as other_col 
         from messages group by name) as b on
        a.name = b.name
        and a.other_col = b.other_col

This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.

Eric
  • 92,005
  • 12
  • 114
  • 115
  • This works as long as max(id) is unique between groups. Otherwise you can add additional join condition by group identifier as seen here https://stackoverflow.com/a/7745635/3163618 tests http://sqlfiddle.com/#!9/f13270/2/0 – qwr Jun 09 '23 at 16:24
107

I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:

SELECT id, name, other_columns
FROM messages
WHERE id IN (
    SELECT MAX(id)
    FROM messages
    GROUP BY name
);

I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)

This should work both on MySQL and SQL Server.

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • This solution is crashing mysql server / service. I have checked it with 10 million records, not recommend this solution. Using IN in this case is very worst. – Kamlesh Sep 17 '21 at 12:34
  • 1
    @Kamlesh Perhaps you are missing some indexes? Also this solution is nearly 10 years old, maybe some update has changed behavior or performance with this query. – JYelton Sep 17 '21 at 14:42
  • Nice and elegant solution. Just a little improvement to make it work even with not univoque sorting attributes. SELECT not_univoque_id, name, other_columns FROM messages WHERE (name, not_univoque_id) IN ( SELECT name, MAX(not_univoque_id) FROM messages GROUP BY name ); – bytepan Nov 01 '21 at 06:46
  • This even works with Firebird 1.0.3! – Wolf Dec 09 '21 at 12:30
  • Good answer, This link is like your answer https://thispointer.com/retrieving-the-last-record-in-each-group-mysql/ – MrSalesi Apr 20 '23 at 21:50
  • This works as long as MAX(id) is unique among groups. Otherwise you can also match on the pair (MAX(id), Name) if supported by your engine. https://stackoverflow.com/a/7745679/3163618 – qwr Jun 09 '23 at 16:37
55

Solution by sub query fiddle Link

select * from messages where id in
(select max(id) from messages group by Name)

Solution By join condition fiddle link

select m1.* from messages m1 
left outer join messages m2 
on ( m1.id<m2.id and m1.name=m2.name )
where m2.id is null

Reason for this post is to give fiddle link only. Same SQL is already provided in other answers.

Community
  • 1
  • 1
Vipin
  • 4,851
  • 3
  • 35
  • 65
17

An approach with considerable speed is as follows.

SELECT * 
FROM messages a
WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)

Result

Id  Name    Other_Columns
3   A   A_data_3
5   B   B_data_2
6   C   C_data_1
Song Zhengyi
  • 339
  • 2
  • 8
17

We will look at how you can use MySQL at getting the last record in a Group By of records. For example if you have this result set of posts.

id category_id post_title
1 1 Title 1
2 1 Title 2
3 1 Title 3
4 2 Title 4
5 2 Title 5
6 3 Title 6

I want to be able to get the last post in each category which are Title 3, Title 5 and Title 6. To get the posts by the category you will use the MySQL Group By keyboard.

select * from posts group by category_id

But the results we get back from this query is.

id category_id post_title
1 1 Title 1
4 2 Title 4
6 3 Title 6

The group by will always return the first record in the group on the result set.

SELECT id, category_id, post_title
FROM posts
WHERE id IN (
    SELECT MAX(id)
    FROM posts
    GROUP BY category_id );

This will return the posts with the highest IDs in each group.

id category_id post_title
3 1 Title 3
5 2 Title 5
6 3 Title 6

Reference Click Here

James Risner
  • 5,451
  • 11
  • 25
  • 47
Yagnesh bhalala
  • 1,107
  • 1
  • 15
  • 17
13

Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:

WITH Ranked AS (
  SELECT Id, Name, OtherColumns,
    ROW_NUMBER() OVER (
      PARTITION BY Name
      ORDER BY Id DESC
    ) AS rk
  FROM messages
)
  SELECT Id, Name, OtherColumns
  FROM messages
  WHERE rk = 1;

I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:

Second, if it doesn't, this is often a good way to proceed:

SELECT
  Id, Name, OtherColumns
FROM messages
WHERE NOT EXISTS (
  SELECT * FROM messages as M2
  WHERE M2.Name = messages.Name
  AND M2.Id > messages.Id
)

In other words, select messages where there is no later-Id message with the same Name.

Steve Kass
  • 7,144
  • 20
  • 26
9

Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.

We sometimes need to do this with tables with even more than 60 million rows.

For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).

I will use the following tables:

CREATE TABLE temperature(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 
  groupID INT UNSIGNED NOT NULL, 
  recordedTimestamp TIMESTAMP NOT NULL, 
  recordedValue INT NOT NULL,
  INDEX groupIndex(groupID, recordedTimestamp), 
  PRIMARY KEY (id)
);

CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id)); 

The temperature table is populated with about 1.5 million random records, and with 100 different groups. The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).

As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.

If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:

SELECT 
  last(t1.id) AS id, 
  t1.groupID, 
  last(t1.recordedTimestamp) AS recordedTimestamp, 
  last(t1.recordedValue) AS recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
ORDER BY t1.recordedTimestamp, t1.id
GROUP BY t1.groupID;

which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient. Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.

However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.

Example 1

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
  SELECT t2.id
  FROM temperature t2 
  WHERE t2.groupID = g.id
  ORDER BY t2.recordedTimestamp DESC, t2.id DESC
  LIMIT 1
);

This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 2

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM temperature t1
INNER JOIN ( 
  SELECT max(t2.id) AS id   
  FROM temperature t2
  INNER JOIN (
    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
    FROM selected_group g
    INNER JOIN temperature t3 ON t3.groupID = g.id
    GROUP BY t3.groupID
  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
  GROUP BY t2.groupID
) t5 ON t5.id = t1.id;

This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 3

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM temperature t1
WHERE t1.id IN ( 
  SELECT max(t2.id) AS id   
  FROM temperature t2
  INNER JOIN (
    SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
    FROM selected_group g
    INNER JOIN temperature t3 ON t3.groupID = g.id
    GROUP BY t3.groupID
  ) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
  GROUP BY t2.groupID
)
ORDER BY t1.groupID;

This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 4

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
  SELECT max(t2.id)
  FROM temperature t2 
  WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
      SELECT max(t3.recordedTimestamp)
      FROM temperature t3 
      WHERE t3.groupID = g.id
    )
);

This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc

Example 5

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
  SELECT 
    t2.id, 
    t2.groupID, 
    t2.recordedTimestamp, 
    t2.recordedValue, 
    row_number() OVER (
      PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
    ) AS rowNumber
  FROM selected_group g 
  INNER JOIN temperature t2 ON t2.groupID = g.id
) t1 WHERE t1.rowNumber = 1;

This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc

Example 6

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM (
  SELECT 
    last_value(t2.id) OVER w AS id, 
    t2.groupID, 
    last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp, 
    last_value(t2.recordedValue) OVER w AS recordedValue
  FROM selected_group g
  INNER JOIN temperature t2 ON t2.groupID = g.id
  WINDOW w AS (
    PARTITION BY t2.groupID 
    ORDER BY t2.recordedTimestamp, t2.id 
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
) t1
GROUP BY t1.groupID;

This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc

Example 7

SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue 
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
LEFT JOIN temperature t2 
  ON t2.groupID = g.id 
  AND (
    t2.recordedTimestamp > t1.recordedTimestamp 
    OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
  )
WHERE t2.id IS NULL
ORDER BY t1.groupID;

This one was taking forever so I had to kill it.

Yoseph
  • 730
  • 1
  • 7
  • 8
  • This is a different problem. And the solution is a huge UNION ALL query. – Paul Spiegel Dec 10 '19 at 14:46
  • @PaulSpiegel I guess you are joking about the huge UNION ALL. Besides the fact that one would need to know all of the selected groups in advance, and that with 2,000 selected groups that would be an incredibly huge query, it would perform even worse than the fastest example above, so no, that would not be a solution. – Yoseph Dec 12 '19 at 10:06
  • I'm absolutely serious. I've tested that in the past with a couple of hundred groups. When you need to handle ties in big groups, UNION ALL is the only way in MySQL to force an optimal execution plan. `SELECT DISTINCT(groupID)` is fast and will give you all data that you need to construct such a query. You should be fine with the query size as long as it doesn't exceed `max_allowed_packet`, which defaults to 4MB in MySQL 5.7. – Paul Spiegel Dec 12 '19 at 17:40
7

Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list

SELECT 
  `Id`,
  `Name`,
  SUBSTRING_INDEX(
    GROUP_CONCAT(
      `Other_Columns` 
      ORDER BY `Id` DESC 
      SEPARATOR '||'
    ),
    '||',
    1
  ) Other_Columns 
FROM
  messages 
GROUP BY `Name` 

Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one

Fiddle Demo

M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118
7

Hi @Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:

SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId
bikashphp
  • 165
  • 3
  • 10
  • 1
    This one the fastest i found – CORSAIR Apr 10 '19 at 12:01
  • This is one is also nice b/c limit and offset can be used in the subquery (or whatever it is called when a query is used in a join). MySQL does not allow limit/offset in typical subqueries, but they are allowed for joins like this. – Charles L. Mar 19 '21 at 00:03
6
SELECT 
  column1,
  column2 
FROM
  table_name 
WHERE id IN 
  (SELECT 
    MAX(id) 
  FROM
    table_name 
  GROUP BY column1) 
ORDER BY column1 ;
M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118
  • 1
    Could you elaborate a bit on your answer? Why is your query preferrable to Vijays original query? – janfoeh May 04 '14 at 11:57
6

You can take view from here as well.

http://sqlfiddle.com/#!9/ef42b/9

FIRST SOLUTION

SELECT d1.ID,Name,City FROM Demo_User d1
INNER JOIN
(SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);

SECOND SOLUTION

SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;
Shrikant Gupta
  • 131
  • 1
  • 11
6

If you need the most recent or oldest record of a text column in a grouped query, and you would rather not use a subquery, you can do this...

Ex. You have a list of movies and need to get the count in the series and the latest movie

id series name
1 Star Wars A New hope
2 Star Wars The Empire Strikes Back
3 Star Wars Return of The Jedi
SELECT COUNT(id), series, SUBSTRING(MAX(CONCAT(id, name)), LENGTH(id) + 1), 
FROM Movies
GROUP BY series

This returns...

id series name
3 Star Wars Return of The Jedi

MAX will return the row with the highest value, so by concatenating the id to the name, you now will get the newest record, then just strip off the id for your final result.

More efficient than using a subquery.

So for the given example:

SELECT MAX(Id), Name, SUBSTRING(MAX(CONCAT(Id, Other_Columns)), LENGTH(Id) + 1), 
FROM messages
GROUP BY Name

Happy coding, and "May The Force Be With You" :)

  • This may not work, Let's suppose in your example The id of "A New hope" was 9 and the id of "Return of The Jedi" was 10. This is because the MAX function for the Concatenation of the id and name would be doing it on a string. Thus "9A New hope" would be having higher order than "10Return of The Jedi" – Suraj Apr 28 '22 at 07:50
5

**

Hi, this query might help :

**

SELECT 
  *
FROM 
  message 

WHERE 
  `Id` IN (
    SELECT 
      MAX(`Id`) 
    FROM 
      message 
    GROUP BY 
      `Name`
  ) 
ORDER BY 
   `Id` DESC
Abhishek Sengupta
  • 2,938
  • 1
  • 28
  • 35
5

I've not yet tested with large DB but I think this could be faster than joining tables:

SELECT *, Max(Id) FROM messages GROUP BY Name
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 20
    This returns arbitrary data. In other words there returned columns might not be from the record with MAX(Id). – harm Jul 03 '14 at 15:05
  • Useful to select the max Id from a set of record with WHERE condition : "SELECT Max(Id) FROM Prod WHERE Pn='" + Pn + "'" It returns the max Id from a set of records with same Pn.In c# use reader.GetString(0) to get the result – Nicola Apr 08 '15 at 09:24
  • Why did this post get positive votes in the first place? In my opinion, it totally misses the point. – Wolf Dec 09 '21 at 13:07
4

Here is my solution:

SELECT 
  DISTINCT NAME,
  MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES 
FROM MESSAGE;
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
4
SELECT * FROM table_name WHERE primary_key IN (SELECT MAX(primary_key) FROM table_name GROUP BY column_name )
S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
ShriP
  • 128
  • 6
4

i find best solution in https://dzone.com/articles/get-last-record-in-each-mysql-group

select * from `data` where `id` in (select max(`id`) from `data` group by `name_id`)
milad nazari
  • 339
  • 3
  • 5
4

Try this:

SELECT jos_categories.title AS name,
       joined .catid,
       joined .title,
       joined .introtext
FROM   jos_categories
       INNER JOIN (SELECT *
                   FROM   (SELECT `title`,
                                  catid,
                                  `created`,
                                  introtext
                           FROM   `jos_content`
                           WHERE  `sectionid` = 6
                           ORDER  BY `id` DESC) AS yes
                   GROUP  BY `yes`.`catid` DESC
                   ORDER  BY `yes`.`created` DESC) AS joined
         ON( joined.catid = jos_categories.id )  
Brock Adams
  • 90,639
  • 22
  • 233
  • 295
3

If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.

QUERY

SELECT t1.Id, 
       t1.Name, 
       t1.Other_Columns
FROM 
(
     SELECT Id, 
            Name, 
            Other_Columns,
    (
        CASE Name WHEN @curA 
        THEN @curRow := @curRow + 1 
        ELSE @curRow := 1 AND @curA := Name END 
    ) + 1 AS rn 
    FROM messages t, 
    (SELECT @curRow := 0, @curA := '') r 
    ORDER BY Name,Id DESC 
)t1
WHERE t1.rn = 1
ORDER BY t1.Id;

SQL Fiddle

Ullas
  • 11,450
  • 4
  • 33
  • 50
3

If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.

Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.

So your query will look like:

SELECT * FROM Messages WHERE IsLastInGroup = 1
michal.jakubeczy
  • 8,221
  • 1
  • 59
  • 63
3

MariaDB 10.3 and newer using GROUP_CONCAT.

The idea is to use ORDER BY + LIMIT:

SELECT GROUP_CONCAT(id ORDER BY id DESC LIMIT 1) AS id,
       name,
       GROUP_CONCAT(Other_columns ORDER BY id DESC LIMIT 1) AS Other_columns
FROM t
GROUP BY name;

db<>fiddle demo

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
3

The below query will work fine as per your question.

SELECT M1.* 
FROM MESSAGES M1,
(
 SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
 FROM MESSAGES
 GROUP BY 1
) M2
WHERE M1.Others_data = M2.Max_Others_data
ORDER BY Others_data;
animuson
  • 53,861
  • 28
  • 137
  • 147
Teja
  • 13,214
  • 36
  • 93
  • 155
2

How about this:

SELECT DISTINCT ON (name) *
FROM messages
ORDER BY name, id DESC;

I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN. In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs

Azathoth
  • 582
  • 1
  • 7
  • 29
2

Yet another option without subqueries.

This solution uses MySQL LAST_VALUE window function, exploiting Window Function Frame available MySQL tool from .

SELECT DISTINCT 
    LAST_VALUE(Id)            
        OVER(PARTITION BY Name 
             ORDER     BY Id 
             ROWS BETWEEN 0 PRECEDING 
                      AND UNBOUNDED FOLLOWING),
    Name,
    LAST_VALUE(Other_Columns)            
        OVER(PARTITION BY Name 
             ORDER     BY Id 
             ROWS BETWEEN 0 PRECEDING 
                      AND UNBOUNDED FOLLOWING)
FROM   
    tab

Try it here.

lemon
  • 14,875
  • 6
  • 18
  • 38
2

In my quest for a universal groupwise-max, I've seen many answers and blog posts on the subject. Even my favorite (actually part of a fantastic series on the subject) failed to identify a portable solution, instead diving deep into specifics per RDMBS.

Luckily, a portable solution does exist!

The secondary index you need for this is name. (name, id would be identical, as the primary key is always included implicitly.)

Create groups of the messages, and use a dependent subquery to get the latest row for each group.

SELECT m.*

-- Step 1: Start by obtaining the groupwise maximums
FROM
(
    SELECT (
        -- Step 1b: Find the ID of the group maximum by seeking in the index
        SELECT id
        FROM messages m
        WHERE m.name = groups.name
        ORDER BY m.name DESC, m.id DESC -- Match the index EXACTLY, and indicate direction
        LIMIT 1
    ) AS id

    -- Step 1a: Find the groups by seeking through the index
    FROM messages AS groups
    GROUP BY groups.name
) AS maxes

-- Step 2: For each group, join the max row by ID
-- This neatly separates any potential followup SQL from the groupwise-max tactics
INNER JOIN messages m ON m.id = maxes.id
;

This is portable because it requires only the following combination of building blocks:

  • Indexed GROUP BY.
  • Indexed SELECT with ORDER BY [ASC/DESC] and LIMIT/TOP.
  • Dependent subqueries.

Just be sure to have the correct index: GroupKeyColumn(s), GroupWinnerColumn(s), PrimaryKeyColumn(s).

In OP's case, the group key is name, the group's winner is determined by id, and the primary key is already covered by that, so: name, id.

Many have suggested solutions involving subqueries, but the most overlooked aspect is the highly specific set of ordering clauses that causes the correct index to be used - in the right traversal direction, no less.

Additional Advantages

  • Easily adjustable for min (ASC) vs. max (DESC).
  • Winner per group can be composite, e.g. timestamp, id. (This also allows us to disambiguate non-unique winners, such as "latest timestamp".)
  • Group key can be composite, e.g. company_id, department_name.
  • Easily extended with a WHERE on which groups to select.
  • Easily extended with a WHERE on what items to ignore, both indexed (id >= 1000) and non-indexed (is_deleted = 0).

Why does this work [optimally]?

Imagine leafing through the physical phone book, finding the last entry for each town, i.e. the entry with that town's alphabetically greatest name. How would you do it?

You would start at the very end. The very last entry in the book is the group maximum of the last town. It is the first result row that you encounter.

For each subsequent desired result row, you would binary search backwards, to the next-greatest town. At the point where the current town transitions into its predecessor, there is the predecessor's last row (alphabetically greatest name), i.e. your next result row. Repeat until no more towns.

Loosely speaking, the phone book is like a secondary index on { Town, Name, PhoneNumber }, with PhoneNumber serving as the primary key. (I'm simplifying things for agument's sake, pretending phone numbers are assigned to one person and names form a single column.)

You are effectively doing a reverse seek through the index. By repeatedly jumping to the next town efficiently (thanks to binary search or a B-tree structure), the work is constrained by the number of result rows rather than the total number of rows. This is asymptotically optimal. And thanks to the reverse traversal direction, each town you encounter "starts" with its greatest row, your target. That is important: imagine the absurd amount of needless work if you'd have to scan all rows for a town.

Changing the solution to a groupwise-min is as trivial as changing the traversal direction, i.e. from DESC to ASC.

RDBMS Notes

  • Whereas MySQL 8 correctly shows Using index for this, MySQL 5.7 shows a worrisome Using where; Using index, but it actually performs correctly. (Tested on a huge data set involving very large groups. Tens of thousands of results spread through hundreds of millions of records were obtained in ~3 seconds.)
  • For SQL Server, the syntax is SELECT TOP 1 instead of SELECT ... LIMIT 1.
Timo
  • 7,992
  • 4
  • 49
  • 67
1

Another approach :

Find the propertie with the max m2_price withing each program (n properties in 1 program) :

select * from properties p
join (
    select max(m2_price) as max_price 
    from properties 
    group by program_id
) p2 on (p.program_id = p2.program_id)
having p.m2_price = max_price
Ka.
  • 1,189
  • 1
  • 12
  • 18
1

Hope below Oracle query can help:

WITH Temp_table AS
(
    Select id, name, othercolumns, ROW_NUMBER() over (PARTITION BY name ORDER BY ID 
    desc)as rank from messages
)
Select id, name,othercolumns from Temp_table where rank=1
Calos
  • 1,783
  • 19
  • 28
kiruba
  • 129
  • 5
1

What about:

select *, max(id) from messages group by name 

I have tested it on sqlite and it returns all columns and max id value for all names.

Jacek Błocki
  • 452
  • 3
  • 9
  • 4
    First, your answer doesn't provides correct result in general for MySQL as the max ID and the rest of the columns could be from different records of the same group. Second, the same answer is already there (https://stackoverflow.com/a/9956875/1089242) like 9 years ago. Third, the question mentioned MySQL specifically. If your concern is about generic sql databases (or just sqlite), please check other questions, e.g. https://stackoverflow.com/q/3800551/1089242 – Lacek Jan 06 '21 at 02:41
  • 1
    I've tested it with sqlite only and it produced correct result. I haven't tried MySQL. The solution was simple, so I've posted it here. Sometimes simple is wrong. Can you share a non-working sqlite example? – Jacek Błocki Jan 09 '21 at 17:04
  • 2
    @JacekBłocki This question is about MySQL. Your answer may work in sqlite (I don't know, I don't have it), but it doesn't work in MySQL. – Brilliand Feb 05 '21 at 20:01
1

As of MySQL 8.0.14, this can also be achieved using Lateral Derived Tables:

SELECT t.*
FROM messages t
JOIN LATERAL (
  SELECT name, MAX(id) AS id 
  FROM messages t1
  WHERE t.name = t1.name
  GROUP BY name
) trn ON t.name = trn.name AND t.id = trn.id

db<>fiddle

id'7238
  • 2,428
  • 1
  • 3
  • 11
1

Here is a more efficient version in 1 line, works as long as the table has a time stamp column.

SELECT Id, Name, SUBSTRING_INDEX(MAX(CONCAT(TimeStamp, ',', Other_Columns)), ',', -1)
FROM Messages
ORDER BY id DESC GROUP BY Name 

This will return the latest record for the group on "Other_Columns"

1

I had a similar issue

Subquery and join to the rescue

SELECT p."Date"
        ,p."Symbol"
        ,p."ratio_roll_qtr_ret"
    FROM PUBLIC."prices_vw" AS p
    JOIN (
        SELECT "Symbol"
            ,max("Date")
        FROM PUBLIC."prices_vw"
        GROUP BY "Symbol"
        ) AS sq ON p."Date" = sq."max"
        AND p."Symbol" = sq."Symbol"
    WHERE p."ratio_roll_qtr_ret" IS NOT NULL
    ORDER BY "ratio_roll_qtr_ret" DESC;
thistleknot
  • 1,098
  • 16
  • 38
0

You can group by counting and also get the last item of group like:

SELECT 
    user,
    COUNT(user) AS count,
    MAX(id) as last
FROM request 
GROUP BY user
Amir Fo
  • 5,163
  • 1
  • 43
  • 51