0

I am working with a database that includes a history table showing when projects have moved to the next state of development. Here is some example data.

dbo.history_table

| ProjectID | State | Timestamp               |
|         1 |     1 | 2018-03-22 10:38:27.000 |
|         1 |     2 | 2018-03-23 10:22:56.000 |
|         1 |     3 | 2018-03-24 12:18:32.000 |
|         2 |     1 | 2018-03-24 11:01:17.000 |
|         1 |     4 | 2018-03-25 10:32:41.000 |
|         2 |     4 | 2018-03-26 12:39:03.000 |

There are a number of states that look something like this:

| State # | Description     | Notes
| State 1 | Planning        |
| State 2 | Pre-Production  |
| State 3 | Production      |
| State 4 | Post-Production |
| State 5 | Successful      | Terminal state
| State 6 | Unsuccessful    | Terminal state
| State 7 | Cancelled       | Terminal state

These states are roughly sequential, but not quite, and not always. For instance, each project can only end in one of the three terminal states. Furthermore, although all projects begin in State 1, not all projects hit all states before terminating.

I need to write a query that looks at a particular state and shows the next state those records move to. The data I need from this query would look something like this:

Report: State 2 Pipeline

| Total:            | 100 |
| Moved to State 3: |  50 |
| Moved to State 4: |   0 |
| Moved to State 5: |  25 |
| Moved to State 6: |   0 |
| Moved to State 7: | 25  |
Sparrower
  • 25
  • 6
  • 1
    Your history table would definitely benefit from a timestamp that shows when the record was inserted. That would allow you to chronologically show how the project advanced. Without that, you will have to hope that your heap is in the right order. – dfundako Mar 26 '18 at 21:00
  • Without a date parameter in the history table, it'll be very hard to keep them sequential. There's still an answer to be found here, but once a state is set, there is no going back. – Twelfth Mar 26 '18 at 21:01
  • @Twelfth That may not be a correct assumption. What if a project made it from stage 1 to 2, then the business decided they wanted something added on, and it goes back to stage 1? – dfundako Mar 26 '18 at 21:03
  • So what would be the expected output from your sample data? And I think you may need to expand your sample data to cover some of the more "interesting' paths. – Sean Lange Mar 26 '18 at 21:05
  • @dfundako - Heh, that was exactly what I was stating, the design without a date eliminates that functionality for the client...technically they could simply by deleting the state 2 line, but then the history table loses the history of it moving from state 1 to state 2 then back to state 1. If Sparrower treated this as a pivot (pivoit the data to be id,state_1,state_2, etc...) then used a series of case statements to read that, he could technically make this work. Far from ideal – Twelfth Mar 26 '18 at 21:05
  • 1
    @Twelfth Considering OP said they are "roughly sequential, but not quite, and not always", I will assume we are missing a lot of info needed to make an air tight solution. – dfundako Mar 26 '18 at 21:07
  • Some clarifications: I added the timestamp to the history table. The table does have one, I knew I was forgetting something. D'oh. As for the route a project takes through the states, they only move to higher states, but I wouldn't want to rule anything out for the future. – Sparrower Mar 26 '18 at 21:09
  • @dfundako - I'll give ya that...though he has added the date column in an edit which makes this considerably easier. The case statement would have been silly long to read it all. when state_2 = 'yes' and state_3 is null and state_4 is null and state_7 = 'yes' would have had to be one of the many lines – Twelfth Mar 26 '18 at 21:09
  • 1
    @Sparrower - The key word you want to use is lag and lead (and partition by). I suspect the answer here https://stackoverflow.com/questions/18915208/sql-server-lead-lag-analytic-function-across-groups-and-not-within-groups is the right logic for you. heh, the count by type / sum to get percentages is an additional complication but still doable. – Twelfth Mar 26 '18 at 21:12
  • @Sparrower I really don't think your end goal is possible due to how you described the scenario. How can you, for example, look at state 2 and definitively say what the next state is if, in your own words, the states are not sequential, and they don't have to hit all states before terminating? That means state 2 could go to state 3 or state 7 or maybe back to state 1? – dfundako Mar 26 '18 at 21:25
  • @Twelfth Thank you, this looks very promising. I had never heard of LEAD/LAG before, I am playing around with them now. – Sparrower Mar 26 '18 at 21:50
  • @dfundako That is correct, although as of right now states only move to a higher state. It is a very tricky ask. – Sparrower Mar 26 '18 at 21:50

1 Answers1

1

This applies the LEAD logic to find the next row's state and utilizes GROUPING SETS to add the grand total:

with cte as
 (
   select 
      State,
      lead(State) 
      over (partition by ProjectID 
            order by Timestamp) as nextState
   from myTable
 ) 
select 
   case when grouping(nextState) = 1 then 'Total' else nextState end, 
   count(*)
from cte
where State = 2
group by grouping sets ((nextState), ())
order by grouping(nextState) desc, nextState

But it will not return a row for states that don't exist as next state. You need to Left join this select to the table describing the existing states.

dnoeth
  • 59,503
  • 4
  • 39
  • 56