1

I'm querying a table that contains state transitions for a state engine. The table is set up so that it has the previous_state, current_state, and timestamp of the transition, grouped by unique ids.

My goal is to find a sequence of target intervals, defined as timestamp of the initial state transition (eg timestamp when we shift from from 1->2), and timestamp of the target next state transition that matches a specific condition (eg the next timestamp that current_state=3 OR current_state=4).

state_transition_table
+------------+---------------+-----------+----+
| prev_state | current_state | timestamp | id |
+------------+---------------+-----------+----+
|          1 |             2 |       4.5 |  1 |
|          2 |             3 |       5.2 |  1 |
|          3 |             1 |       5.4 |  1 |
|          1 |             2 |      10.3 |  1 |
|          2 |             5 |      10.4 |  1 |
|          5 |             4 |      10.8 |  1 |
|          4 |             1 |      11.0 |  1 |
|          1 |             2 |      12.3 |  1 |
|          2 |             3 |      13.5 |  1 |
|          3 |             1 |      13.6 |  1 |
+------------+---------------+-----------+----+

Within a given id, we want to find all intervals that start with 1->2 (easy enough query), and end with either state 3 or 4. 1->2->anything->3 or 4

An example output table given the input above would have the three states and the timestamps for when we transition between the states:

target output
+------------+---------------+------------+-----------+-----------+
| prev_state | current_state | end_state  | curr_time | end_time  |
+------------+---------------+------------+-----------+-----------+
|          1 |             2 |          3 |       4.5 |       5.2 |
|          1 |             2 |          4 |      10.3 |      10.8 |
|          1 |             2 |          3 |      12.3 |      13.5 |
+------------+---------------+------------+-----------+-----------+

The best query I could come up with is using window functions in a sub-table, and then creating the new columns from that table. But this solution only finds the next row following the initial transition, and doesnt allow other states to occur between then and when our target state arrives.

WITH state_transitions as (
SELECT
  id
  previous_state, current_state,
  LEAD(current_state) OVER ( PARTITION BY id ORDER BY timestamp) AS end_state,
  timestamp as curr_time,
  LEAD(timestamp) OVER ( PARTITION BY id ORDER BY timestamp) AS end_time
FROM
  state_transition_table

SELECT
  previous_state,
  current_state,
  end_state,
  curr_time,
  end_time
FROM state_transitions
WHERE previous_state=1 and current_state=2
ORDER BY curr_time

This query would incorrectly give the second output row end_state==5, which is not what I am looking for.

How can one search a table for the next row that matches my target condition, eg end_state=3 OR end_state=4?

cogit0
  • 13
  • 4
  • What is a "target interval"? Those two terms are not used in the definition of your data. I find the explanation hard to follow. – Gordon Linoff May 03 '19 at 22:11
  • I clarify what a target interval is in the first paragraph, as the pair of timestamps where the first timestamp is an initial state transition (eg 1->2), and the second timestamp is the next timestamp after the initial state that the state changes to states 3 or 4. Editing to clarify. – cogit0 May 03 '19 at 23:15

1 Answers1

1

This requires a recursive query that checks each row against siblings. This query should account for more than three rows. I assumed ORACLE for the seed data, may need to adapt your syntax to your database engine. I tried to document the query as best as I thought it was needed.

WITH /*SEED DATA*/
  state_transition_table(prev_state, current_state, time_stamp, id) as (
              SELECT          1 ,             2 ,       4.5 ,  1 --FROM DUAL
    UNION ALL SELECT          2 ,             3 ,       5.2 ,  1 --FROM DUAL
    UNION ALL SELECT          3 ,             1 ,       5.4 ,  1 --FROM DUAL
    UNION ALL SELECT          1 ,             2 ,      10.3 ,  1 --FROM DUAL
    UNION ALL SELECT          2 ,             5 ,      10.4 ,  1 --FROM DUAL
    UNION ALL SELECT          5 ,             4 ,      10.8 ,  1 --FROM DUAL
    UNION ALL SELECT          4 ,             1 ,      11.0 ,  1 --FROM DUAL
    UNION ALL SELECT          1 ,             2 ,      12.3 ,  1 --FROM DUAL
    UNION ALL SELECT          2 ,             3 ,      13.5 ,  1 --FROM DUAL
    UNION ALL SELECT          3 ,             1 ,      13.6 ,  1 --FROM DUAL
)

/*THE END STATES YOU ARE LOOKING FOR*/
, end_states (a_state) as (
              select 3 --FROM DUAL
    union all select 4 --FROM DUAL
)

/*ORDER THE STEPS TO USE THE order_id COLUMN TO EVALUATE THE NEXT NODE*/
, ordered_states as (
    SELECT row_number() OVER (ORDER BY time_stamp)  order_id
         , prev_state
         , current_state
         , id
         , time_stamp
    FROM   state_transition_table
)

/*RECURSIVE QUERY WITH ANSI SYNTAX*/
, recursive (
           root_order_id
         , order_id
         , time_stamp
         , prev_state
         , current_state
         --, id

         , steps
  )
as (
    SELECT order_id root_order_id /*THE order_id OF EACH ROOT ROW*/
         , order_id
         , time_stamp
         , prev_state
         , current_state

         , CAST(order_id as char(100)) as steps /*INITIAL VALIDATION PATH*/
    FROM   ordered_states
    WHERE  prev_state = 1 AND current_state = 2 /*INITIAL CONDITION*/

    UNION ALL
    SELECT prev.root_order_id
         , this.order_id
         , this.time_stamp
         , prev.prev_state
         , this.current_state

         , CAST(CONCAT(CONCAT(RTRIM(LTRIM(prev.steps)), ', '), RTRIM(LTRIM(CAST(this.order_id as char(3))))) as char(100)) as steps
    FROM   recursive prev /*ANSI PSEUDO TABLE*/
         , ordered_states this /*THE SIBLING ROW TO CHECK*/

    WHERE prev.order_id = this.order_id - 1 /*ROW TO PREVIOUS ROW JOIN*/
      and prev.current_state not in (select a_state from end_states) /*THE PREVIOUS ROW STATE IS NOT AN END STATE */
)

select init_state.prev_state
     , init_state.current_state as mid_state /*this name is better, I think*/
     , end_state.current_state
     , init_state.time_stamp as initial_time /*initial_time is better, I think*/
     , end_state.time_stamp as end_time /*end_time is better, I think*/
     , recursive.steps as validation_path_by_order_id
from   recursive
inner join ordered_states init_state
    on init_state.order_id = recursive.root_order_id
inner join ordered_states end_state
    on end_state.order_id = recursive.order_id
where  recursive.current_state in (select a_state from end_states)

One final note. The resulting columns are only accounting for 3 rows (prev_state, mid_state and current_state). As I said above, there are cases where you can have a path from (1) to (2) to (3 or 4) with more than three rows, lets say 1 to 2 to 5 to 2 to 3, thus the mid_state is really just one state in the middle.

Final-final note: Your desired results table was wrong, but you corrected it.

JPortillo
  • 543
  • 3
  • 9
  • Excellent, thank you! I didn't know that recursive queries could be done in SQL. Updated the output table with correct outputs; I had shifted the input table to be more clear in what the states were, and didnt update the output table. – cogit0 May 03 '19 at 23:46
  • I've been trying to run it on my postgres table, but finding that the Oracle-isms don't work, such as the 'dual' temporary table for `end_states`, or `rownum` as the `order_id`. I'm not sure how to replicate this solution in SQL, as it seems like the `order_id` is what the recursion is iterating over. – cogit0 May 06 '19 at 20:09
  • Oracle's dual is not needed in Postgres. https://stackoverflow.com/questions/30303638/equivalence-of-from-dual-in-postgresql#30303679. Rownum is different too https://stackoverflow.com/questions/3959692/rownum-in-postgresql#3959748 – JPortillo May 07 '19 at 02:02