I've read through the answers on MySQL order by before group by but applying it to my query ends up with a subquery in a subquery for a rather simple case so I'm wondering if this can be simplified:
Schema with sample data
For brevity I've omitted the other fields on the members
table. Also, there's many more tables joined in the actual application but those are straightforward to join. It's the membership_stack
table that's giving me issues.
CREATE TABLE members (
id int unsigned auto_increment,
first_name varchar(255) not null,
PRIMARY KEY(id)
);
INSERT INTO members (id, first_name)
VALUES (1, 'Tyler'),
(2, 'Marissa'),
(3, 'Alex'),
(4, 'Parker');
CREATE TABLE membership_stack (
id int unsigned auto_increment,
member_id int unsigned not null,
sequence int unsigned not null,
team varchar(255) not null,
`status` varchar(255) not null,
PRIMARY KEY(id),
FOREIGN KEY(member_id) REFERENCES members(id)
);
-- Algorithm to determine correct team:
-- 1. Only consider rows with the highest sequence number
-- 2. Order statuses and pick the first one found:
-- (active, completed, cancelled, abandoned)
INSERT INTO membership_stack (member_id, sequence, team, status)
VALUES (1, 1, 'instinct', 'active'),
(1, 1, 'valor', 'abandoned'),
(2, 1, 'valor', 'active'),
(2, 2, 'mystic', 'abandoned'),
(2, 2, 'valor', 'completed'),
(3, 1, 'instinct', 'completed'),
(3, 2, 'valor', 'active');
I can't change the database schema because the data is synchronized with an external data source.
Query
This is what I have so far:
SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
FROM membership_stack AS ms
JOIN (
SELECT member_id, MAX(sequence) AS sequence
FROM membership_stack
GROUP BY member_id
) AS t1
ON ms.member_id = t1.member_id
AND ms.sequence = t1.sequence
RIGHT JOIN members AS m
ON ms.member_id = m.id
ORDER BY m.id, FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned');
This works as expected but members may appear multiple times if their "most recent sequence" involves more than one team. What I need to do is aggregate again on id
and select the FIRST row in each group.
However that poses some issues:
- There is no
FIRST()
function in MySQL - This entire resultset would become a subtable (subquery), which isn't a big deal here but the queries are quite big on the application.
- It needs to be compatible with ONLY_FULL_GROUP_BY mode as it is enabled on MySQL 5.7 by default. I haven't checked but I doubt that
FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned')
is considered a functionally dependent field on this resultset. The query also needs to be compatible with MySQL 5.1 as that is what we are running at the moment.
Goal
| id | first_name | sequence | team | status |
|----|------------|----------|----------|-----------|
| 1 | Tyler | 1 | instinct | active |
| 2 | Marissa | 2 | valor | completed |
| 3 | Alex | 2 | valor | active |
| 4 | Parker | NULL | NULL | NULL |
What can I do about this?
Edit: It has come to my attention that some members don't belong to any team. These members should be included in the resultset with null values for those fields. Question updated to reflect new information.