2

I have a table which keeps the status and created_timestamp corresponding to a process_id. One row is inserted as soon as the process status changes. So, there are as many rows with same process_id as there are statuses associated with it. I want to create another table/view using this data which has one single row corresponding to a process_id, its current status and its previous status. I need to make a Informatica job for this, but a SQL query will be just as helpful.
Sample Input:

Process_id | Status     | Created 
         1 | In_queue   | 2014-08-01 00:01:01 
         1 | Started    | 2014-08-01 01:03:01 
         1 | In_process | 2014-08-01 01:50:20 
         1 | Complete   | 2014-08-01 03:10:20 

Sample Output: 
Process_id | Previous_status | Current_status | Updated 
         1 | In_process      | Complete       | 2014-08-01 03:10:20
Cœur
  • 37,241
  • 25
  • 195
  • 267
Drunk Knight
  • 131
  • 1
  • 2
  • 14
  • What do you mean by "make a computer science job for this"? SQL is computer science. – Barmar Jul 31 '14 at 12:06
  • Possible duplicate of http://stackoverflow.com/questions/20849098/mysql-how-get-value-in-previous-row-and-value-in-next-row – mTorres Jul 31 '14 at 12:09
  • @mTorres That question is about knowing the ID of the row you want the adjacent rows for. It doesn't really address getting the newest row and the previous before that. – Barmar Jul 31 '14 at 12:12
  • Here's a question about getting the latest 2 rows for each group: http://stackoverflow.com/questions/24457442/how-to-find-previous-record-n-per-group-maxtimestamp-timestamp You just need to use that method, and then pivot the result to get both statuses in the same row. – Barmar Jul 31 '14 at 12:13
  • See http://stackoverflow.com/questions/7674786/mysql-pivot-table for how to pivot the results. And http://stackoverflow.com/questions/2129693/mysql-using-limit-within-group-by-to-get-n-results-per-group has lots more examples of getting the latest N rows per group. – Barmar Jul 31 '14 at 12:14
  • I think this problem is deceptively tricky – Strawberry Jul 31 '14 at 12:35
  • @Barmar I meant Informatica... I don't know how the hell did it end up as computer science – Drunk Knight Jul 31 '14 at 14:45

2 Answers2

1
SELECT Process_id, Previous_status, Current_status, Updated 
FROM (
    SELECT
    Process_id, 
    Status AS Current_status, 
    Created AS Updated,
    @prev_state AS Previous_status,
    @prev_state := Status 
    FROM
    your_table t
    , (select @prev_state := null) var_init
    WHERE Process_id = 1
    ORDER BY Created
) sq

UPDATE:

To do it for all Process_ids and just get the latest record for each Process_id you can use this:

SELECT sq.Process_id, sq.Previous_status, sq.Current_status, sq.Updated 
FROM (
    SELECT
    Process_id, 
    Status AS Current_status, 
    Created AS Updated,
    @prev_state := if(@prev_process != Process_id, null, @prev_state),
    @prev_state AS Previous_status,
    @prev_state := Status,
    @prev_process := Process_id
    FROM
    your_table t
    , (select @prev_state := null, @prev_process := null) var_init
    ORDER BY Process_id, Created
) sq
INNER JOIN (
    SELECT Process_id, MAX(Created) AS max_created
    FROM your_table
    GROUP BY Process_id
) max_c
ON sq.Process_id = max_c.Process_id AND sq.Updated = max_c.max_created
fancyPants
  • 50,732
  • 33
  • 89
  • 96
  • This works perfectly. Since, the data volume is significant, I'd rather go with this method only. Two questions though: 1. How can we do it for all process_ids instead of mentioning only 1? 2. I have one more columns like status..... Update on that also need to be captured. Is it possible in the same query? – Drunk Knight Aug 01 '14 at 09:17
  • Updated my answer to reflect your first question. For second question just add more variables as needed. – fancyPants Aug 01 '14 at 09:32
  • Hey... its for process_id=1 only... you can slightly edit the filter to correct it. Other than that, it's the right word for this solution is "cool" :) Thanks a lot.! – Drunk Knight Aug 01 '14 at 09:45
  • Yes, copy&paste error. I also added Process_id to the `ORDER BY` clause. This is important! – fancyPants Aug 01 '14 at 09:48
0

Performance aside, here's how I'd do it...

SELECT a.process_id
     , a.status
     , a.created
     , b.status prev_status
     , b.created prev_created
  FROM 
     ( SELECT x.*
            , COUNT(*) rank
         FROM my_table x
         JOIN my_table y
           ON y.process_id = x.process_id
          AND y.created >= x.created
        GROUP
           BY x.process_id
            , x.created
     ) a
  LEFT
  JOIN
     ( SELECT x.*
            , COUNT(*) rank
         FROM my_table x
         JOIN my_table y
           ON y.process_id = x.process_id
          AND y.created >= x.created
        GROUP
           BY x.process_id
            , x.created
     ) b
    ON b.process_id = a.process_id
 WHERE b.rank = a.rank + 1
   AND a.rank = 1;

On a larger dataset, I'd probably go with a more fancyPantsy solution

Strawberry
  • 33,750
  • 13
  • 40
  • 57