0

I am trying to write a SQL query in MySQL Workbench that will return to me the sums of records I moved to a particular status considering only the latest timestamp for a particular record. I also need to do this without a sub query (or nested select).

Given the below table, I want to know that user with id 1 moved two records to status with id 2. I need to not include in my counts if the same record was moved to two different status ids, but only count the latest status id.

Table

user_id acted_on_record_id moved_to_status_id timestamp
1 1234 2 2022-01-01 19:39:37
1 1234 3 2022-01-02 19:39:37
1 1234 2 2022-01-03 19:39:37
1 5678 2 2022-01-03 19:39:37

Here is the query I have so far:

SELECT t1.user_id, t1.acted_on_record_id,
SUM(DISTINCT IF(t1.moved_to_status_id = 3, 1, 0)) AS pending,
SUM(DISTINCT IF(t1.moved_to_status_id = 2, 1, 0)) AS open,
MAX(t1.timestamp) as timestamp
FROM table1 t1
GROUP BY t1.user_id, t1.acted_on_record_id

This is the result I want:

user_id acted_on_record_id pending open timestamp
1 1234 0 1 2022-01-03 19:39:37
1 5678 0 1 2022-01-03 19:39:37

However, my query gives me this result:

user_id acted_on_record_id pending open timestamp
1 1234 1 1 2022-01-03 19:39:37
1 5678 0 1 2022-01-03 19:39:37

It shows a 1 in both pending and 1 in open columns because the SUM IF aggregates are not mutually exclusive or distinct on the acted_on_record_id. Is there a way to have these two aggregates know about each other and only sum the one with the greater timestamp without using a sub query (nested select)?

graceja
  • 81
  • 5
  • Why do you have `DISTINCT`? It makes no sense in `SUM()`. – Barmar May 24 '22 at 19:40
  • Why can't you use subqueries? Can you use window functions (MySQL 8.x)? – Barmar May 24 '22 at 19:45
  • See https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql?rq=1 for how to get the latest record for each record ID. – Barmar May 24 '22 at 19:46
  • @Barmar, if I take off the DISTINCT inside the SUM(), the record 1234 will get counted twice as open. Why does it not make sense? I cannot use window functions unfortunately, this query needs to run in Spring JPA eventually as a native query and we are using MySQL 5.7 there. I had initially gotten this working with a subquery, but the execution time went from 0.5 seconds to 15 seconds. Efficiency is the reason I would like to avoid a subquery. I will take a look at the link you provided, thanks! – graceja May 24 '22 at 19:53
  • `DISTINCT` doesn't make it use only the latest record. It makes it sum each value of the `IF()` once. – Barmar May 24 '22 at 19:58
  • I don't think there's any way to get the latest row in each group without a subquery or window function. To solve the performance problem, make sure you have a multi-column index on `acted_on_record_id, timestamp` – Barmar May 24 '22 at 20:00
  • @Barmar Thank you for your input, using the link you provided I did eventually figure it out without the use of a subquery. I posted my final solution as an answer. – graceja May 24 '22 at 20:40

1 Answers1

0

I eventually figured it out by expanding on a solution here: Retrieving the last record in each group - MySQL

I used a LEFT JOIN to compare the table against itself. This query returned in 1.095 seconds where my prior solution (not posted) using a subquery returned in 15.268 seconds.

SELECT t1.user_id, t1.acted_on_record_id,
SUM(IF(t1.moved_to_status_id = 3, 1, 0)) AS pending,
SUM(IF(t1.moved_to_status_id = 2, 1, 0)) AS open
MAX(t1.timestamp) as timestamp
FROM table1 t1 LEFT JOIN table1 t2
  ON (t1.acted_on_record_id = t2.acted_on_record_id AND t1.user_id = t2.user_id AND t1.id < t2.id)
WHERE t2.user_id IS NULL
group by t1.user_id, t1.acted_on_record_id, t1.moved_to_status_id
graceja
  • 81
  • 5