An MySQL query to get latest record for multiple distinct items in a table

Question

I'm having trouble pulling the data out of a table as it is taking too long and slowing down the system.

Here is the reference table:

CREATE TABLE IF NOT EXISTS `User_Table` (
    `user_table_id`  int unsigned        not null auto_increment,
    `user_id`        int unsigned        not null,
    `unixtimes`      int unsigned        not null,
    `status`         char(1)             not null default '',
    `text1`          text                not null,
    `text2`          text                not null,
PRIMARY KEY (`user_table_id`),
UNIQUE INDEX user_table_1 (`user_id`, `status`, `unixtimes`),
UNIQUE INDEX User_Table_2 (user_id, unixtimes, status)
);

status can have one of four values;

'a' = approved
'd' = denied
'p' = pending
's' = skipped

I am trying to join the same table multiple times. The end goal is single record with the user_id (so it can be joined to the user table among other things), the most recent pending text, and the most recent approved text if it exists so that they can be compared.

| user_id | login | pending_text1 | pending_text2 | current_text1 | current_text1 |
|---------|-------|---------------|---------------|---------------|---------------|
| 8675309 | Bob   | Second entry  | Second other  | First entry   | First other   |

If it makes a difference, there should only ever be one record marked as pending for any given user_id. The pending records are reviewed and either updated to be approved or denied. If a new pending record comes in before the existing pending record has been reviewed, the old record is updated to skipped leaving just the one pending.

It should also be noted that I'm mainly concerned with looking at the latest pending records, and there might not be any approved record. For instance, during the first review there can only be a pending. That's why I've be using the LEFT JOIN.

Here is a stripped down version of the query I've been using but it's taking at minimum 4-5 seconds each time, but with less than 100k records. Since the volume of records can only go up, I'm hoping for a better return on the query.

SELECT
    `up`.`user_id` AS 'user_id',
    `u`.`login` AS 'login',
    `up`.`text1` AS 'pending_text1',
    `up`.`text2` AS 'pending_text2',
    `record_current`.`text1` AS 'current_text1',
    `record_current`.`text2` AS 'current_text2'

FROM
    `user_table` up
JOIN
    `user` u
ON    `up`.`user_id` = `u`.`user_id`

LEFT JOIN (
    SELECT
        `up`.*
    FROM
        `user_table` up
    JOIN (
        SELECT
            `user_id`, MAX(`unixtimes`) unixtimes 
        FROM 
            `user_table` 
        WHERE
            `status` = 'a'
        GROUP BY
            `user_id`) all_approved
    ON 
        `up`.`user_id` = `all_approved`.`user_id` AND `up`.`unixtimes` = `all_approved`.`unixtimes`) record_current
ON 
    `up`.`user_id` = `record_current`.`user_id`

WHERE 
    `up`.`status` = 'p'
ORDER BY 
    `up`.`unixtimes`;

Any ideas?

Update: Added a second index : UNIQUE INDEX User_Table_2 (user_id, unixtimes, status) Adding EXPLAIN to question:

| id  | select_type | table        | type   | possible_keys                 | key          | key_len | ref                                         | rows  | filtered | Extra                                        |
|-----|-------------|--------------|--------|-------------------------------|--------------|---------|---------------------------------------------|-------|----------|----------------------------------------------|
| 1   | PRIMARY     | up           | ALL    | User_Table_1,User_Table_2     | null         | null    | null                                        | 93858 | 75.0     | Using where; Using temporary; Using filesort |
| 1   | PRIMARY     | u            | eq_ref | PRIMARY                       | PRIMARY      | 4       | dbase.up.user_id                            | 1     | 100.0    |                                              |
| 1   | PRIMARY     | <derived2>   | ALL    | null                          | null         | null    | null                                        | 82793 | 100.0    |                                              |
| 2   | DERIVED     | <derived3>   | ALL    | null                          | null         | null    | null                                        | 82793 | 100.0    |                                              |
| 2   | DERIVED     | up           | ref    | User_Table_1,User_Table_2     | User_Table_2 | 8       | all_approved.user_id,all_approved.unixtimes | 469   | 100.0    |                                              |
| 3   | DERIVED     | User_Table   | range  | null                          | User_Table_1 | 5       | null                                        | 10    | 100.0    | Using where; Using index for group-by        |

Related: https://stackoverflow.com/q/28984670/629186 and https://stackoverflow.com/q/17327043/629186 — MivaScott, Apr 26 '19 at 05:24
Do you have an index on `status`? If not, can you create an index on it and see if the query runs faster? If not, can you write `explain extended ...your sql...` and paste its result? Other options - switch to `JOIN `user` u ON `up`.`user_id` = `u`.`user_id` and `up`.`status` = 'p'` and remove `where` clause. See if that makes a difference. Then, comment `order by` clause at the bottom and see if that makes a difference. You might be able to find one or two of the changes make a difference. — zedfoxus, Apr 26 '19 at 05:40
For performance issues please use `EXPLAIN` command before your `SELECT` and then check `possible_keys` and `rows` in results. That would be be a good start for checking your query. — Anton, Apr 26 '19 at 13:59
@zedfoxus Added more details. Moving the WHERE clauses to the JOIN didn't seem to make a noticeable difference. — MivaScott, Apr 27 '19 at 19:26
What indexes do you have on `user_table` and what fields are there in the index? — zedfoxus, Apr 27 '19 at 19:32
@zedfoxus I have the indexes listed in the CREATE statement at the top — MivaScott, Apr 27 '19 at 19:34
Could you add an index on status and re-run the query please? — zedfoxus, Apr 27 '19 at 19:39
@zedfoxus Oddly, when I add the index on just status, the query takes MUCH longer. Like I stopped it after 15 seconds. I took it out and I'm back to 2-4 seconds — MivaScott, Apr 27 '19 at 20:06
Ah, very interesting! Could you tell me the count of records in user_table where status = p? What's the count of records in user_table where status = a? — zedfoxus, Apr 27 '19 at 20:33
@zedfoxus, a=82797, d=8931, p=2127, s=3. And no real difference with order by on or off — MivaScott, Apr 27 '19 at 22:43
Can you remove `from user_table up join user u on up.user_id = u.user_id` and use this instead? `(select userid, text1, text2, login from user_table ut join user u on ut.user_id = u.userid and ut.status = 'p') up` and change `u.login` to `up.login` in the top-select? Does that make a difference? — zedfoxus, Apr 27 '19 at 23:03
@zedfoxus I need to keep the user table joined as it is used for more than just the login. I stripped it all out for the minimum viable example, but I will have to add back in where clause to see if u.active == 1, what is the role assigned to the user, and so forth. — MivaScott, Apr 27 '19 at 23:19
Gotcha. You can certainly move the join with user out of the first subquery. I wanted to see if using the first subquery makes the overall query efficient or not. — zedfoxus, Apr 27 '19 at 23:24
@zedfoxus I lost you on that last bit. I think it would easier if you submitted an answer with the new query and go from there. — MivaScott, Apr 27 '19 at 23:49

An MySQL query to get latest record for multiple distinct items in a table

0 Answers0