2

I have a table which contains an ID and a Date for an event. Each row is for one date. I am trying to determine consecutive date ranges and consolidate output to show the ID,StartDate,EndDate

ID      Date
200236  2011-01-02 00:00:00.000
200236  2011-01-03 00:00:00.000
200236  2011-01-05 00:00:00.000
200236  2011-01-06 00:00:00.000
200236  2011-01-07 00:00:00.000
200236  2011-01-08 00:00:00.000
200236  2011-01-09 00:00:00.000
200236  2011-01-10 00:00:00.000
200236  2011-01-11 00:00:00.000
200236  2011-01-12 00:00:00.000
200236  2011-01-13 00:00:00.000
200236  2011-01-15 00:00:00.000
200236  2011-01-16 00:00:00.000
200236  2011-01-17 00:00:00.000

Output would look like:

ID       StartDate    EndDate
200236   2011-01-02   2011-01-03
200236   2011-01-05   2011-01-13
200236   2011-01-15   2011-01-17

Any thoughts on how to handle this in SQL Server 2000?

Andriy M
  • 76,112
  • 17
  • 94
  • 154

5 Answers5

1
SELECT ...
FROM   ...
WHERE  date_column BETWEEN '2011-01-02' AND '2011-01-15'

perhaps? Reference

Or you can do a sub-query and link the next record using a MAX where date is <= current date:

SELECT id, date, (SELECT MAX(date) FROM mytable WHERE date <= mytable.date) AS nextDate
FROM   mytable

Or use:

SELECT TOP 1 date
FROM         mytable
WHERE        date <= mytable.date AND id <> mytable.id
ORDER BY     date

As the sub-query so it grabs the next date in line after the current record.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • The data listed is just a sample of the table. There are thousands of entries (rows) and I was providing an example. There are multiple IDs and each ID can have anywhere from 1 to infinity dates. – George Gonzola May 05 '11 at 17:00
  • @GeorgeGonzola: So use a sub-select and reference the current row's date to find the next date (then pull that value as your third column). – Brad Christie May 05 '11 at 17:02
1

I've just done this similar thing in SQL Server 2008. I think the following translation will work for SQL Server 2000:

-- Create table variable
DECLARE @StartTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  startDate date
)

Insert Into @StartTable(userid, startDate)
--This finds the start dates by finding unmatched values
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID 
   And DateAdd(day, 1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] Is NULL
ORDER BY t1.ID, t1.[Date]

-- Create table variable
DECLARE @EndTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  endDate date
)

Insert Into @EndTable(userid, endDate)
--This finds the end dates by getting unmatched values 
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID
   And DateAdd(day, -1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] IS NULL
ORDER BY t1.ID, t1.[Date]

Select eT.userid, startDate, endDate 
From @EndTable eT
INNER JOIN @StartTable sT On eT.userid = sT.userid 
AND eT.rowid = sT.rowid;

So as you can see, I created two table variables, one for starts and one for ends, by self-joining the table on the date either just prior to or just after the date in the [Date] column. This means that I'm selecting only records that don't have a date prior (so these would be at the beginning of a period) for the Start Table and those that have no date following (so these would be at the end of a period) for the End Table.

When these are inserted into the table variable, they are numbered in sequence because of the Identity column. Then I join the two table variables together. Because they are ordered, the start and end dates should always match up properly.

This solution works for me because I have at most one record per ID per day and I am only interested in days, not hours, etc. Even though it is several steps, I like it because it is conceptually simple and eliminates matched records without having cursors or loops. I hope it will work for you too.

Kit Z. Fox
  • 644
  • 1
  • 11
  • 24
  • Nice, I like the efficient use of temporary tables. By the way, I'm not sure how much this solution could have in common with your similar thing in SQL Server 2008, but you might want to have a look at [this answer](http://stackoverflow.com/questions/5662545/how-do-i-group-on-continuous-ranges/5662607#5662607) (just in case). – Andriy M May 10 '11 at 00:15
  • Thanks! My solution is similar to that one; in fact it's much simpler to use CTEs and partitioning to accomplish this task, but unfortunately, those things are not available in SQL Server 2000. – Kit Z. Fox May 10 '11 at 00:20
0

This SO Question might help you. I linked directly to Rob Farley's answer as I feel this is a similar problem.

Community
  • 1
  • 1
Caley Woods
  • 4,707
  • 4
  • 29
  • 38
0

One approach you can take is to add a field that indicates the next date in the sequence. (Either add it to your current table or use a temporary table, store the underlying data to the temp table and then update the next date in the sequence).

Your starting data structure would look something like this:

ID, PerfDate, NextDate
200236, 2011-01-02, 2011-01-03
200236, 2011-01-03, 2011-01-04
etc.

You can then use a series of correlated subqueries to roll the data up into the desired output:

SELECT ID, StartDate, EndDate
FROM (
SELECT DISTINCT ID, PerfDate AS StartDate, 
    (SELECT MIN([PerfDate]) FROM [SourceTable] S3
    WHERE S3.ID = S1.ID
    AND S3.NextDate > S1.PerfDate
    AND ISNULL(
        (SELECT MIN(PerfDate) 
        FROM [SourceTable] AS S4
        WHERE S4.ID = S1.ID 
        AND S4.NextDate > S3.NextDate), S3.NextDate + 1) > S3.NextDate) AS EndDate
FROM [SourceTable] S1
WHERE 
    ISNULL(
        (SELECT MAX(NextDate) 
        FROM [SourceTable] S2 
        WHERE S2.ID = S1.ID 
        AND S2.PerfDate < S1.PerfDate), PerfDate -1) < S1.PerfDate)q
ORDER BY q.ID, q.StartDate
Tim Lentine
  • 7,782
  • 5
  • 35
  • 40
0

This is the way I've done it in the past. It's a two step process:

  1. Build the set of candidate contiguous periods
  2. If there are any overlapping periods, delete all but the longest such period.

Here's a script that shows how it's done. You might be able to pull it off in a single [bug, ugly] query, but trying to do that makes my head hurt. I'm using temp tables as it makes the debugging a whole lot easier.

drop table #source
create table #source
(
  id    int      not null ,
  dtCol datetime not null ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 1: Each date must be unique for a given ID value.
  -----------------------------------------------------------------------
  unique clustered ( id , dtCol ) ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 2: The datetime column only represents a day.
  -- The value of the time component is always 00:00:00.000
  -----------------------------------------------------------------------
  check ( dtCol = convert(datetime,convert(varchar,dtCol,112),112) ) ,

)
go

insert #source values(1,'jan 1, 2011')
insert #source values(1,'jan 4, 2011')
insert #source values(1,'jan 5, 2011')
insert #source values(2,'jan 1, 2011')
insert #source values(2,'jan 2, 2011')
insert #source values(2,'jan 3, 2011')
insert #source values(2,'jan 5, 2011')
insert #source values(3,'jan 1, 2011')
insert #source values(4,'jan 1, 2011')
insert #source values(4,'jan 2, 2011')
insert #source values(4,'jan 3, 2011')
insert #source values(4,'jan 4, 2011')
go

insert #source values( 200236 , '2011-01-02')
insert #source values( 200236 , '2011-01-03')
insert #source values( 200236 , '2011-01-05')
insert #source values( 200236 , '2011-01-06')
insert #source values( 200236 , '2011-01-07')
insert #source values( 200236 , '2011-01-08')
insert #source values( 200236 , '2011-01-09')
insert #source values( 200236 , '2011-01-10')
insert #source values( 200236 , '2011-01-11')
insert #source values( 200236 , '2011-01-12')
insert #source values( 200236 , '2011-01-13')
insert #source values( 200236 , '2011-01-15')
insert #source values( 200236 , '2011-01-16')
insert #source values( 200236 , '2011-01-17')
go

drop table #candidate_range
go
create table #candidate_range
(
  rowId   int      not null identity(1,1) ,
  id      int      not null ,
  dtFrom  datetime not null ,
  dtThru  datetime not null ,
  length  as 1+datediff(day,dtFrom,dtThru) ,

  primary key nonclustered ( rowID ) ,
  unique clustered (id,dtFrom,dtThru) ,

)
go

--
-- seed the candidate range table with the set of all possible contiguous ranges for each id
--
insert #candidate_range ( id , dtFrom , dtThru )
select id      = tFrom.id    ,
       valFrom = tFrom.dtCol ,
       valThru = tThru.dtCol
from #source tFrom
join #source tThru on tThru.id     = tFrom.id
                  and tThru.dtCol >= tFrom.dtCol
where 1+datediff(day,tFrom.dtCol,tThru.dtCol) = ( select count(*)
                                                  from #source t
                                                  where t.id = tFrom.id
                                                    and t.dtCol between tFrom.dtCol and tThru.dtCol
                                                )
order by 1,2,3
go

--
-- compare the table to itself. If we find overlapping periods,
-- we'll keep the longest such period and delete the shorter overlapping periods.
--
delete t2
from #candidate_range t1
join #candidate_range t2 on t2.id      = t1.id
                        and t2.rowId  != t1.rowID
                        and t2.length <  t1.length
                        and t2.dtFrom <= t1.dtThru
                        and t2.dtThru >= t1.dtFrom
go

That's about all there is to it.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135