4

I've read through the answers on MySQL order by before group by but applying it to my query ends up with a subquery in a subquery for a rather simple case so I'm wondering if this can be simplified:

Schema with sample data

For brevity I've omitted the other fields on the members table. Also, there's many more tables joined in the actual application but those are straightforward to join. It's the membership_stack table that's giving me issues.

CREATE TABLE members (
  id int unsigned auto_increment,
  first_name varchar(255) not null,
  PRIMARY KEY(id)
);

INSERT INTO members (id, first_name)
     VALUES (1, 'Tyler'),
            (2, 'Marissa'),
            (3, 'Alex'),
            (4, 'Parker');

CREATE TABLE membership_stack (
  id int unsigned auto_increment,
  member_id int unsigned not null,
  sequence int unsigned not null,
  team varchar(255) not null,
  `status` varchar(255) not null,
  PRIMARY KEY(id),
  FOREIGN KEY(member_id) REFERENCES members(id)
);

-- Algorithm to determine correct team:
-- 1. Only consider rows with the highest sequence number
-- 2. Order statuses and pick the first one found:
--    (active, completed, cancelled, abandoned)

INSERT INTO membership_stack (member_id, sequence, team, status)
     VALUES (1, 1, 'instinct', 'active'),
            (1, 1, 'valor', 'abandoned'),
            (2, 1, 'valor', 'active'),
            (2, 2, 'mystic', 'abandoned'),
            (2, 2, 'valor', 'completed'),
            (3, 1, 'instinct', 'completed'),
            (3, 2, 'valor', 'active');

I can't change the database schema because the data is synchronized with an external data source.

Query

This is what I have so far:

    SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
      FROM membership_stack AS ms
      JOIN (
    SELECT member_id, MAX(sequence) AS sequence
      FROM membership_stack
  GROUP BY member_id
           ) AS t1
        ON ms.member_id = t1.member_id
       AND ms.sequence = t1.sequence
RIGHT JOIN members AS m
        ON ms.member_id = m.id
  ORDER BY m.id, FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned');

This works as expected but members may appear multiple times if their "most recent sequence" involves more than one team. What I need to do is aggregate again on id and select the FIRST row in each group.

However that poses some issues:

  1. There is no FIRST() function in MySQL
  2. This entire resultset would become a subtable (subquery), which isn't a big deal here but the queries are quite big on the application.
  3. It needs to be compatible with ONLY_FULL_GROUP_BY mode as it is enabled on MySQL 5.7 by default. I haven't checked but I doubt that FIELD(ms.status, 'active', 'completed', 'cancelled', 'abandoned') is considered a functionally dependent field on this resultset. The query also needs to be compatible with MySQL 5.1 as that is what we are running at the moment.

Goal

| id | first_name | sequence |     team |    status |
|----|------------|----------|----------|-----------|
|  1 |      Tyler |        1 | instinct |    active |
|  2 |    Marissa |        2 |    valor | completed |
|  3 |       Alex |        2 |    valor |    active |
|  4 |     Parker |     NULL |     NULL |      NULL |

What can I do about this?

Edit: It has come to my attention that some members don't belong to any team. These members should be included in the resultset with null values for those fields. Question updated to reflect new information.

rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156
  • What if the sequence **and** the status is the same? Example Tyler@valor has status 'active'? – Paul Spiegel Mar 11 '17 at 21:14
  • @PaulSpiegel I've questioned that too but the data provider hasn't gotten back yet on whether that's possible. There should never be more than one `active` in the same sequence simultaneously but I suppose there could be multiple `completed` if they do them in succession in the same sequence. – rink.attendant.6 Mar 11 '17 at 21:26
  • However you should define a "distinct" order. In my solution i use `membership_stack.id ASC` as the last column in ORDER BY clause (just in case). – Paul Spiegel Mar 11 '17 at 21:29

2 Answers2

0

I would do this using variables.

You are looking for the one membership_stack row that is maximal for your special ordering. I'm focusing just on that. The join back to members is trivial.

select ms.*
from (select ms.*,
             (@rn := if(@m = member_id, @rn + 1,
                        if(@m := member_id, 1, 1)
                       )
             ) as rn
      from membership_stack ms cross join
           (select @m := -1, @rn := 0) params
      order by member_id, sequence desc,
               field(ms.status, 'active', 'completed', 'cancelled', 'abandoned')
     ) ms
where rn = 1;

The variables is how the logic is implemented. The ordering is key to getting the right result.

EDIT:

MySQL is quite finicky about LIMIT in subqueries. It is possible that this will work:

select ms.*
from membership_stack ms
where (sequence, status) = (select ms2.sequence, ms2.status
                            from membership_stack ms2
                            where ms2.member_id = ms.member_id
                            order by ms2.member_id, ms2.sequence desc,
                                     field(ms2.status, 'active', 'completed', 'cancelled', 'abandoned')
                            limit 1
                           );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The WHERE with LIMIT in subquery produces an error `Unknown column 'field' in 'where clause'`. The variable one appears to work correctly, would you mind explaining it more so that I can understand and propose the solution to others? – rink.attendant.6 Mar 11 '17 at 21:16
  • @rink.attendant.6 . . . You seem familiar with window functions. That is the MySQL equivalent of `row_number() over (partition by member_id order by field(. . .)`. The variables are keeping track of the state, row-by-row. (The error in the second query was a mistyping of the column name.) – Gordon Linoff Mar 11 '17 at 22:30
  • I've never heard of window functions actually. I've only seen `row_number()` from someone trying to solve this exact problem who had experience with another RDBMS. Anyways I'll discuss this SO question with my team and hopefully accept a solution sometime next week. – rink.attendant.6 Mar 11 '17 at 22:38
0

You can use a correlated subquery in the WHERE clause with LIMIT 1:

SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
FROM members AS m
JOIN membership_stack AS ms ON ms.member_id = m.id
WHERE ms.id = (
    SELECT ms1.id
    FROM membership_stack AS ms1
    WHERE ms1.member_id = ms.member_id
    ORDER BY ms1.sequence desc, 
             FIELD(ms1.status, 'active', 'completed', 'cancelled', 'abandoned'),
             ms1.id asc
    LIMIT 1
)
ORDER BY m.id;

Demo: http://rextester.com/HGU18448

Update

To include members who have no entries in the membership_stack table you should use a LEFT JOIN, and move the subquery condition from the WHERE clause to the ON clause:

SELECT m.id, m.first_name, ms.sequence, ms.team, ms.status
FROM members AS m
LEFT JOIN membership_stack AS ms 
    ON  ms.member_id = m.id
    AND ms.id = (
        SELECT ms1.id
        FROM membership_stack AS ms1
        WHERE ms1.member_id = ms.member_id
        ORDER BY ms1.sequence desc, 
                 FIELD(ms1.status, 'active', 'completed', 'cancelled', 'abandoned'),
                 ms1.id asc
        LIMIT 1
    )
ORDER BY m.id;

Demo: http://rextester.com/NPI79503

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • I've edited my question to include cases where the members might not have any entries in the membership stack. Is there any way to adapt this query for those cases? Or is there any issues with putting the `WHERE` condition in the `ON` clause with a `LEFT JOIN`? – rink.attendant.6 Mar 16 '17 at 18:32
  • @rink.attendant.6 Use a LEFT JOIN. Check the update. – Paul Spiegel Mar 16 '17 at 18:41