1

I'd like to show the latest action from each task. This is the table (dummy data but same structure) :

//t_task
task_id             task_name
     A1              PC  Proc
     A2         Printer  Proc
     A3       Stationery Proc

//t_task_d
task_id           assigned_to
     A1                  John
     A1                 Sally
     A2                  John
     A3                 Sally

//t_act
no       act_id       act_date     task_id
 1           C1     2017-07-10          A1
 2           C2     2017-07-14          A1
 3           C3     2017-07-17          A1
 4           C1     2017-07-21          A2

//t_act_d
act_id       act_name
    C1      Surveying 
    C2       Contract
    C3      Execution

From the above tables, I'd like to create some kind of report. This is my expected output:

no            task_name   dates_of_act       status
 1              PC Proc     2017-07-17    Execution
 2         Printer Proc     2017-07-21    Surveying
 3      Stationery Proc         -           Pending /*if it's NULL, then it should be pending, but I can change this in the PHP section*/

This is my closest current query:

SELECT 
    t_task.task_name, 
    DATE(t_act.act_date) AS 'dates_of_act', 
    t_act_d.act_name 
FROM t_task 

INNER JOIN t_task_d ON t_task.task_id = t_task_d.task_id
LEFT OUTER JOIN t_act ON t_task.task_id = t_act.task_id 
LEFT OUTER JOIN t_act_d ON t_act.act_id = t_act_d.act_id

GROUP BY t_task.task_id 
ORDER BY t_act.act_date ASC

My query result is:

no            task_name   dates_of_act       status
 1              PC Proc     2017-07-10    Surveying
 2         Printer Proc     2017-07-21    Surveying
 3      Stationery Proc         -           Pending

Note
I prefer speed because the data is huge. I also try to avoid subqueries if possible

Shota
  • 515
  • 3
  • 18

1 Answers1

1

I think this does it.

select
  t_task.task_id,
  t_task.task_name,
  latest_action.act_date,
  IFNULL(t_act_d.act_name, 'Pending') as act_name
from
  t_task
  left outer join (
    select
      @row_num := IF(@prev_value=concat_ws('', t_act.task_id),@row_num+1, 1) as row_number,
      t_act.task_id,
      t_act.act_id,
      t_act.act_date,
      @prev_value := concat_ws('', t_act.task_id) as z
    from
      t_act,
      (select @row_num := 1) x,
      (select @prev_value := '') y
    order by
      t_act.task_id,
      t_act.act_date desc
  ) as latest_action on
    t_task.task_id = latest_action.task_id
  left outer join t_act_d on
    latest_action.act_id = t_act_d.act_id
where
  latest_action.row_number = 1 or
  latest_action.row_number is null
order by
  case when latest_action.act_date is null then '9999-01-01' else latest_action.act_date end

The results from the data you provided are:

+---------+-----------------+------------+-----------+
| task_id | task_name       | act_date   | act_name  |
+---------+-----------------+------------+-----------+
| A1      | PC  Proc        | 2017-07-17 | Execution |
| A2      | Printer  Proc   | 2017-07-21 | Surveying |
| A3      | Stationery Proc | NULL       | Pending   |
+---------+-----------------+------------+-----------+

I'm more familiar with T-SQL, where I'd use the row_number() window function. The idea is to have the row_number field show a ranking of each row in terms of whether it's the most recent (value 1), second most recent (value 2) etc. action for each task. The most recent action for each task winds up with a row_number of 1, so you can get these out by filtering on row_number = 1 from this latest_action subquery.

Because the latest_action subquery is run once overall, not once per row, it is not much of a performance hit. Unfortunately, I can't promise that the whole variable setting / incrementing thing isn't much of a performance hit, this is the first time I've used this logic in MySQL, I don't know how performant it is.

The logic for how to reproduce T-SQL's row_number() functionality came from here: ROW_NUMBER() in MySQL

Jeff Breadner
  • 1,366
  • 9
  • 19
  • well, this is quite complicated. Is it fast enough to handle hundred thousand of data? – Shota Jul 27 '17 at 04:21
  • Hundreds of thousands of rows should not be a problem. Tens of millions might be. It would be a lot cleaner if it was in Microsoft SQL server, but the problem that you're trying to solve requires some sort of ranking function like this, from what I understand. If you knew that there would only be one action per timestamp, then there might be an easier solution. If you can't guarantee that, then this is probably the best you're going to get. – Jeff Breadner Jul 27 '17 at 04:26
  • The best way to find out is to try it ;) – Jeff Breadner Jul 27 '17 at 04:27