This answer was originally inspired by Get records with max value for each group of grouped SQL results
but then I looked further and came up with the correct solution.
CREATE TABLE T
(`id` int, `from` varchar(7), `to` varchar(7), `somedate` datetime)
;
INSERT INTO T
(`id`, `from`, `to`, `somedate`)
VALUES
(1, 'Number1', 'Number2', '2015-01-01 00:00:00'),
(2, 'Number2', 'Number1', '2015-01-02 00:00:00'),
(3, 'Number2', 'Number1', '2015-01-03 00:00:00'),
(4, 'Number3', 'Number1', '2015-01-04 00:00:00'),
(5, 'Number3', 'Number2', '2015-01-05 00:00:00');
Tested on MySQL 5.6.19
SELECT *
FROM
(
SELECT *
FROM T
ORDER BY LEAST(`to`,`from`), GREATEST(`to`,`from`), somedate DESC
) X
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
Result set
id from to somedate
3 Number2 Number1 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
But, this relies on some shady behavior of MySQL, which will be changed in future versions. MySQL 5.7 rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (ONLY_FULL_GROUP_BY
is disabled), it works like the previous versions, but still it is not guaranteed: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."
So, the correct answer seems to be this:
SELECT T.*
FROM
T
INNER JOIN
(
SELECT
LEAST(`to`,`from`) AS LowVal,
GREATEST(`to`,`from`) AS HighVal,
MAX(somedate) AS MaxDate
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) v
ON T.somedate = v.MaxDate
AND (T.From = v.LowVal OR T.From = v.HighVal)
AND (T.To = v.LowVal OR T.To = v.HighVal)
Result set is the same as above, but in this case it is guaranteed to stay like this, while before you could easily get different date and id for row Number2, Number1
, depending on what indexes you have on the table.
It will work as expected until you have two rows in the original data that have exactly the same somedate
and to
and from
.
Let's add another row:
INSERT INTO T (`id`, `from`, `to`, `somedate`)
VALUES (6, 'Number1', 'Number2', '2015-01-03 00:00:00');
The query above would return two rows for 2015-01-03
:
id from to somedate
3 Number2 Number1 2015-01-03
6 Number1 Number2 2015-01-03
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
To fix this we need a method to choose only one row in the group. In this example we can use unique ID
to break the tie. If there are more than one rows in the group with the same maximum date we will choose the row with the largest ID.
The inner-most sub-query called Groups
simply returns all groups, like original query in the question. Then we add one column id
to this result set, and we choose id
that belongs to the same group and has highest somedate
and then highest id
, which is done by ORDER BY
and LIMIT
. This sub-query is called GroupsWithIDs
. Once we have all groups and an id
of the correct row for each group we join
this to the original table to fetch the rest of the column for found id
s.
final query
SELECT T.*
FROM
(
SELECT
Groups.N1
,Groups.N2
,
(
SELECT T.id
FROM T
WHERE
LEAST(`to`,`from`) = Groups.N1 AND
GREATEST(`to`,`from`) = Groups.N2
ORDER BY T.somedate DESC, T.id DESC
LIMIT 1
) AS id
FROM
(
SELECT LEAST(`to`,`from`) AS N1, GREATEST(`to`,`from`) AS N2
FROM T
GROUP BY LEAST(`to`,`from`), GREATEST(`to`,`from`)
) AS Groups
) AS GroupsWithIDs
INNER JOIN T ON T.id = GroupsWithIDs.id
final result set
id from to somedate
4 Number3 Number1 2015-01-04
5 Number3 Number2 2015-01-05
6 Number1 Number2 2015-01-03