3

Note: I have seen the answers for SQL Group by Date Range and Group rows by contiguous date ranges for groups of values and both mention the use of row_number(). I have not been able to figure this one out even with the help of these answers, and I've finally caved in and come here for help after a full day of contemplation and research. EDIT: Now that I understand Gordon Linoff's answer, I realize how similar my question was to the others I linked and apologize for the near-duplicate post simply because I didn't understand the answers.

I have an input table containing a history of "results" of an inquiry into a client, which is consolidated into a view by the most recent result for each month of each year. The view looks (schematically) like this:

client_id | month |  year  |  result  
1            2       2016       Y
2            3       2016       N     
1            4       2016       Y
1            5       2016       N
2            4       2016       N

and so I am trying to output a result view like this:

client_id | start_month |  start_year |  end_month  |  end_year  |  result
1                2           2016            3           2016          Y
2                3           2016          (NULL)        (NULL)        N    
1                5           2016          (NULL)        (NULL)        N          

Reasons why I am struggling:

  1. Most explanations of row_number() and how it applies here aren't getting through to me
  2. I am working with, rather than date values, columns derived from month-year type inquires originally created in another view from DATEPART() calls, and as this is SQL 2008, I can't turn these back into dates (like 4-1-2016 instead of 4,2016) easily with DATEFROMPARTS()
Community
  • 1
  • 1
Austin
  • 117
  • 2
  • 12

1 Answers1

3

Your data is exactly the same as the data in the other questions -- except for the fact that the values are split into two columns.

To understand what is happening, start with this query:

select t.*,
       row_number() over (partition by client_id order by start_year, start_month) as seqnum_client,
       (start_year * 12 + start_month) as month_counter
from t
order by client_id, start_year, start_Month;

Then take the difference of the last two columns. You will note that these are constant for contiguous periods of time. That becomes the grouping key.

I'll leave you with this version:

select client_id,
       min(start_year * 100 + start_month) as start_yyyymm,
       max(start_year * 100 + start_month) as end_yyyymm,
       max(result) as result
from (select t.*,
           row_number() over (partition by client_id order by start_year, start_month) as seqnum_client,
           (start_year * 12 + start_month as month_counter
     from t
    ) t
group by client_id, (seqnum_client - month_counter)
order by client_id, max(seqnum_client);

Note: I'm not sure about the exactly logic for Result, but max() returns the values in the question.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you so much! I'll get to studying your queries immediately... seeing an example query with columns I'm familiar with might help me figure out the use of "row_number() over partition by..." that has confused me so much in other answers. – Austin Apr 21 '16 at 16:49
  • It works! I just had to add modify the query slightly to group by result rather than taking the max of it, and it correctly consolidated the rows into contiguous ranges of the same result. I believe I understand how this works now, and it helped to start with examining the output of the subquery like you told me to. I played with it and tweaked it to see how it would change until I understood, if not how it worked, at least what it was doing. – Austin Apr 21 '16 at 22:40
  • A side-note, I was expecting whatever solution I used to create null values for whatever range I came up with that hadn't ended yet (ie, in row 2 and 3 of the example output in the question, the end dates are NULL because the range hasn't ended yet). In this solution, rather than being NULL, the end date ends up being the last month that the inquiry was made and that result seen, and this is perhaps a better outcome as it simplifies the conditions needed to filter the view. – Austin Apr 21 '16 at 22:47