6

Suppose I have following table in Sql Server 2008:

ItemId StartDate   EndDate
1      NULL        2011-01-15
2      2011-01-16  2011-01-25
3      2011-01-26  NULL

As you can see, this table has StartDate and EndDate columns. I want to validate data in these columns. Intervals cannot conflict with each other. So, the table above is valid, but the next table is invalid, becase first row has End Date greater than StartDate in the second row.

ItemId StartDate   EndDate
1      NULL        2011-01-17
2      2011-01-16  2011-01-25
3      2011-01-26  NULL

NULL means infinity here.

Could you help me to write a script for data validation?

[The second task]

Thanks for the answers. I have a complication. Let's assume, I have such table:

ItemId  IntervalId StartDate   EndDate
1       1          NULL        2011-01-15
2       1          2011-01-16  2011-01-25
3       1          2011-01-26  NULL
4       2          NULL        2011-01-17
5       2          2011-01-16  2011-01-25
6       2          2011-01-26  NULL

Here I want to validate intervals within a groups of IntervalId, but not within the whole table. So, Interval 1 will be valid, but Interval 2 will be invalid.

And also. Is it possible to add a constraint to the table in order to avoid such invalid records?

[Final Solution]

I created function to check if interval is conflicted:

CREATE FUNCTION [dbo].[fnIntervalConflict]
(
    @intervalId INT,
    @originalItemId INT,
    @startDate DATETIME,
    @endDate DATETIME
)
RETURNS BIT
AS
BEGIN

    SET @startDate = ISNULL(@startDate,'1/1/1753 12:00:00 AM')
    SET @endDate = ISNULL(@endDate,'12/31/9999 11:59:59 PM')

    DECLARE @conflict BIT = 0

    SELECT TOP 1 @conflict = 1
    FROM Items
    WHERE IntervalId = @intervalId
    AND ItemId <> @originalItemId
    AND (
    (ISNULL(StartDate,'1/1/1753 12:00:00 AM') >= @startDate 
     AND ISNULL(StartDate,'1/1/1753 12:00:00 AM') <= @endDate)
     OR (ISNULL(EndDate,'12/31/9999 11:59:59 PM') >= @startDate 
     AND ISNULL(EndDate,'12/31/9999 11:59:59 PM') <= @endDate)
    )

    RETURN @conflict
END

And then I added 2 constraints to my table:

ALTER TABLE dbo.Items ADD CONSTRAINT
    CK_Items_Dates CHECK (StartDate IS NULL OR EndDate IS NULL OR StartDate <= EndDate)

GO

and

ALTER TABLE dbo.Items ADD CONSTRAINT
    CK_Items_ValidInterval CHECK (([dbo].[fnIntervalConflict]([IntervalId], ItemId,[StartDate],[EndDate])=(0)))

GO

I know, the second constraint slows insert and update operations, but it is not very important for my application. And also, now I can call function fnIntervalConflict from my application code before inserts and updates of data in the table.

Egor4eg
  • 2,678
  • 1
  • 22
  • 41
  • You can use p.petkovs answer still - just add IntervalId into the Join conditions – Damien_The_Unbeliever May 17 '11 at 14:07
  • You can't add this as a 100% working constraint on the table - there are various workarounds (usually involving triggers or UDFs) that may work 95% of the time, but unfortunately, SQL Server's support for constraints involving multiple rows are severely lacking - unless we can change your data model a bit, and add some additional constraints (such as requiring that there are no gaps in dates between rows), and it'll make manipulation look wierd, so you'd always be writing SQL, couldn't generally expect e.g. an ORM to maintain the structure. – Damien_The_Unbeliever May 17 '11 at 14:38
  • 1
    These things are always much more complicated than they seem and unless you clearly define how intervals should be working, you will find some of the solutions proposed here will break depending on your data. In this case, @petkov 's solution doesn't handle open-intervals. I can recommend this [book](http://www.cs.arizona.edu/people/rts/tdbbook.pdf) now out of print, hence available from the author's website. A cursory read-through will give you plenty of information on how to think about this sort of querying (and give hints as to some of the larger pitfalls) – Runonthespot May 17 '11 at 16:22

5 Answers5

4

Something like this should give you all overlaping periods

SELECT
* 
FROM
mytable t1 
JOIN mytable t2 ON t1.EndDate>t2.StartDate AND t1.StartDate < t2.StartDate 

Edited for Adrians comment bellow

Petko Petkov
  • 730
  • 3
  • 7
3

This will give you the rows that are incorrect.

Added ROW_NUMBER() as I didnt know if all entries where in order.

-- Testdata
declare @date datetime = '2011-01-17'

;with yourTable(itemID, startDate, endDate)
as
(
    SELECT  1,  NULL, @date
    UNION ALL
    SELECT  2,  dateadd(day, -1, @date),    DATEADD(day, 10, @date)
    UNION ALL
    SELECT  3,  DATEADD(day, 60, @date),    NULL
)

-- End testdata

,tmp
as
(
    select  *
            ,ROW_NUMBER() OVER(order by startDate) as rowno 
    from    yourTable
)

select  *
from    tmp t1
left join   tmp t2
    on t1.rowno = t2.rowno - 1
where   t1.endDate > t2.startDate

EDIT: As for the updated question:

Just add a PARTITION BY clause to the ROW_NUMBER() query and alter the join.

-- Testdata
declare @date datetime = '2011-01-17'

;with yourTable(itemID, startDate, endDate, intervalID)
as
(
    SELECT  1,  NULL, @date, 1
    UNION ALL
    SELECT  2,  dateadd(day, 1, @date), DATEADD(day, 10, @date),1
    UNION ALL
    SELECT  3,  DATEADD(day, 60, @date),    NULL,   1
    UNION ALL
    SELECT  4,  NULL, @date, 2
    UNION ALL
    SELECT  5,  dateadd(day, -1, @date),    DATEADD(day, 10, @date),2
    UNION ALL
    SELECT  6,  DATEADD(day, 60, @date),    NULL,   2
)

-- End testdata

,tmp
as
(
    select  *
            ,ROW_NUMBER() OVER(partition by intervalID order by startDate) as rowno 
    from    yourTable
)

select  *
from    tmp t1
left join   tmp t2
    on t1.rowno = t2.rowno - 1
    and t1.intervalID = t2.intervalID
where   t1.endDate > t2.startDate
Johan
  • 1,152
  • 7
  • 16
  • +1 I like your solution. However, there are a few "gotchas": - Records have to be ordered AND sequential (no gaps) - OP has to clarify: what if End Date is 17 and the next Start Date is 19? 18 is missing. Is that a valid table? – Adriano Carneiro May 17 '11 at 13:37
  • 1
    The OP only stated that an overlap is a conflict. I wouldn't consider the existance of a gap to be a 'gotcha'. However, if one record is `x -> NULL` and another record is `x + 1 -> y`, the current code would not identify it. An additional case would also be 'masked' : `1 -> 10`, `2 -> 4`, `5 -> 7`. With these records, the second record would be shown as conflicting with the first. The third record, however, would not be shown as conflicting; thus requiring an itterative fixes and check approach. – MatBailie May 17 '11 at 13:48
  • If End Date is 17 and the next Start Date is 19. 18 is missing, but intervals do not conflict. So, this is a valid case. – Egor4eg May 17 '11 at 13:50
  • Excellent points. However, about the gaps, I think OP wants to find the record given a date (been there, done that). Then gaps are not welcome. You know how I solved that? In the app. Couldn't find a way to validate in SQL. That's why I find this question so interesting. – Adriano Carneiro May 17 '11 at 13:53
  • You are right. been there, done that. But if a record has not been found, done nothing. – Egor4eg May 17 '11 at 13:59
  • But in my app, there HAD to be something. I was hoping I would find an answer to this old problem. Just posted my old solution as answer... – Adriano Carneiro May 17 '11 at 14:18
  • @Adrian - there is a way using SQL Server 2008. First, model your periods as a semi-open interval (i.e. a period is valid when `GETDATE() >= StartDate` and `< EndDate`). As in OPs model, your first record has a null StartDate, last record has null EndDate. Create Unique index on IntervalID, StartDate, and a seperate index on IntervalID, EndDate. Finally, you set up Foreign Keys such that one periods IntervalID, StartDate references previous periods IntervalID, EndDate, and vice versa. Check constraint `StartDate < EndDate` You'll quickly find you need MERGE to maintain these FK chains... – Damien_The_Unbeliever May 17 '11 at 14:49
  • I am hopeful that there is! I would really like to see the solution to this old problem. As far as now, though, none of the answers posted here are – Adriano Carneiro May 17 '11 at 14:58
  • @Adrian - since it's not directly related to the OPs question, I'll post it as a CW - I'll have to write it again from scratch though, so it'll take a wee while. – Damien_The_Unbeliever May 17 '11 at 15:59
  • @Adrian - I've now added an example of what you can achieve once you're on SQL Server 2008. Hopefully it's relevant to your situation, or you can see ways it can be modified to fit. – Damien_The_Unbeliever May 17 '11 at 18:28
2
declare @T table (ItemId int, IntervalID int, StartDate datetime,   EndDate datetime)

insert into @T
select 1, 1,  NULL,        '2011-01-15' union all
select 2, 1, '2011-01-16', '2011-01-25' union all
select 3, 1, '2011-01-26',  NULL        union all
select 4, 2,  NULL,        '2011-01-17' union all
select 5, 2, '2011-01-16', '2011-01-25' union all
select 6, 2, '2011-01-26',  NULL

select T1.*
from @T as T1
  inner join @T as T2
    on coalesce(T1.StartDate, '1753-01-01') < coalesce(T2.EndDate, '9999-12-31') and
       coalesce(T1.EndDate, '9999-12-31') > coalesce(T2.StartDate, '1753-01-01') and
       T1.IntervalID = T2.IntervalID and
       T1.ItemId <> T2.ItemId

Result:

ItemId      IntervalID  StartDate               EndDate
----------- ----------- ----------------------- -----------------------
5           2           2011-01-16 00:00:00.000 2011-01-25 00:00:00.000
4           2           NULL                    2011-01-17 00:00:00.000
Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
2

Not directly related to the OP, but since Adrian's expressed an interest. Here's a table than SQL Server maintains the integrity of, ensuring that only one valid value is present at any time. In this case, I'm dealing with a current/history table, but the example can be modified to work with future data also (although in that case, you can't have the indexed view, and you need to write the merge's directly, rather than maintaining through triggers).

In this particular case, I'm dealing with a link table that I want to track the history of. First, the tables that we're linking:

create table dbo.Clients (
    ClientID int IDENTITY(1,1) not null,
    Name varchar(50) not null,
    /* Other columns */
    constraint PK_Clients PRIMARY KEY (ClientID)
)
go
create table dbo.DataItems (
    DataItemID int IDENTITY(1,1) not null,
    Name varchar(50) not null,
    /* Other columns */
    constraint PK_DataItems PRIMARY KEY (DataItemID),
    constraint UQ_DataItem_Names UNIQUE (Name)
)
go

Now, if we were building a normal table, we'd have the following (Don't run this one):

create table dbo.ClientAnswers (
    ClientID int not null,
    DataItemID int not null,
    IntValue int not null,
    Comment varchar(max) null,
    constraint PK_ClientAnswers PRIMARY KEY (ClientID,DataItemID),
    constraint FK_ClientAnswers_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
    constraint FK_ClientAnswers_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID)
)

But, we want a table that can represent a complete history. In particular, we want to design the structure such that overlapping time periods can never appear in the database. We always know which record was valid at any particular time:

create table dbo.ClientAnswerHistories (
    ClientID int not null,
    DataItemID int not null,
    IntValue int null,
    Comment varchar(max) null,

    /* Temporal columns */
    Deleted bit not null,
    ValidFrom datetime2 null,
    ValidTo datetime2 null,
    constraint UQ_ClientAnswerHistories_ValidFrom UNIQUE (ClientID,DataItemID,ValidFrom),
    constraint UQ_ClientAnswerHistories_ValidTo UNIQUE (ClientID,DataItemID,ValidTo),
    constraint CK_ClientAnswerHistories_NoTimeTravel CHECK (ValidFrom < ValidTo),
    constraint FK_ClientAnswerHistories_Clients FOREIGN KEY (ClientID) references dbo.Clients (ClientID),
    constraint FK_ClientAnswerHistories_DataItems FOREIGN KEY (DataItemID) references dbo.DataItems (DataItemID),
    constraint FK_ClientAnswerHistories_Prev FOREIGN KEY (ClientID,DataItemID,ValidFrom)
        references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidTo),
    constraint FK_ClientAnswerHistories_Next FOREIGN KEY (ClientID,DataItemID,ValidTo)
        references dbo.ClientAnswerHistories (ClientID,DataItemID,ValidFrom),
    constraint CK_ClientAnswerHistory_DeletionNull CHECK (
        Deleted = 0 or
        (
            IntValue is null and
            Comment is null
        )),
    constraint CK_ClientAnswerHistory_IntValueNotNull CHECK (Deleted=1 or IntValue is not null)
)
go

That's a lot of constraints. The only way to maintain this table is through merge statements (see examples below, and try to reason about why yourself). We're now going to build a view that mimics that ClientAnswers table defined above:

create view dbo.ClientAnswers
with schemabinding
as
    select
        ClientID,
        DataItemID,
        ISNULL(IntValue,0) as IntValue,
        Comment
    from
        dbo.ClientAnswerHistories
    where
        Deleted = 0 and
        ValidTo is null
go
create unique clustered index PK_ClientAnswers on dbo.ClientAnswers (ClientID,DataItemID)
go

And we have the PK constraint we originally wanted. We've also used ISNULL to reinstate the not null-ness of the IntValue column (even though the check constraints already guarantee this, SQL Server is unable to derive this information). If we're working with an ORM, we let it target ClientAnswers, and the history gets automatically built. Next, we can have a function that lets us look back in time:

create function dbo.ClientAnswers_At (
    @At datetime2
)
returns table
with schemabinding
as
    return (
        select
            ClientID,
            DataItemID,
            ISNULL(IntValue,0) as IntValue,
            Comment
        from
            dbo.ClientAnswerHistories
        where
            Deleted = 0 and
            (ValidFrom is null or ValidFrom <= @At) and
            (ValidTo is null or ValidTo > @At)
    )
go

And finally, we need the triggers on ClientAnswers that build this history. We need to use merge statements, since we need to simultaneously insert new rows, and update the previous "valid" row to end date it with a new ValidTo value.

create trigger T_ClientAnswers_I
on dbo.ClientAnswers
instead of insert
as
    set nocount on
    ;with Dup as (
        select i.ClientID,i.DataItemID,i.IntValue,i.Comment,CASE WHEN cah.ClientID is not null THEN 1 ELSE 0 END as PrevDeleted,t.Dupl
        from
            inserted i
                left join
            dbo.ClientAnswerHistories cah
                on
                    i.ClientID = cah.ClientID and
                    i.DataItemID = cah.DataItemID and
                    cah.ValidTo is null and
                    cah.Deleted = 1
                cross join
            (select 0 union all select 1) t(Dupl)
    )
    merge into dbo.ClientAnswerHistories cah
    using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0 and Dup.PrevDeleted = 1
    when matched then update set ValidTo = SYSDATETIME()
    when not matched and Dup.Dupl=1 then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
    values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,CASE WHEN Dup.PrevDeleted=1 THEN SYSDATETIME() END);
go
create trigger T_ClientAnswers_U
on dbo.ClientAnswers
instead of update
as
    set nocount on
    ;with Dup as (
        select i.ClientID,i.DataItemID,i.IntValue,i.Comment,t.Dupl
        from
            inserted i
                cross join
            (select 0 union all select 1) t(Dupl)
    )
    merge into dbo.ClientAnswerHistories cah
    using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
    when matched then update set ValidTo = SYSDATETIME()
    when not matched then insert (ClientID,DataItemID,IntValue,Comment,Deleted,ValidFrom)
    values (Dup.ClientID,Dup.DataItemID,Dup.IntValue,Dup.Comment,0,SYSDATETIME());
go
create trigger T_ClientAnswers_D
on dbo.ClientAnswers
instead of delete
as
    set nocount on
    ;with Dup as (
        select d.ClientID,d.DataItemID,t.Dupl
        from
            deleted d
                cross join
            (select 0 union all select 1) t(Dupl)
    )
    merge into dbo.ClientAnswerHistories cah
    using Dup on cah.ClientID = Dup.ClientID and cah.DataItemID = Dup.DataItemID and cah.ValidTo is null and Dup.Dupl = 0
    when matched then update set ValidTo = SYSDATETIME()
    when not matched then insert (ClientID,DataItemID,Deleted,ValidFrom)
    values (Dup.ClientID,Dup.DataItemID,1,SYSDATETIME());
go

Obviously, I could have built a simpler table (not a join table), but this is my standard go-to example (albeit it took me a while to reconstruct it - I forgot the set nocount on statements for a while). But the strength here is that, the base table, ClientAnswerHistories is incapable of storing overlapping time ranges for the same ClientID and DataItemID values.

Things get more complex when you need to deal with temporal foreign keys.


Of course, if you don't want any real gaps, then you can remove the Deleted column (and associated checks), make the not null columns really not null, modify the insert trigger to do a plain insert, and make the delete trigger raise an error instead.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
0

I've always taken a slightly different approach to the design if I have data that is never to have overlapping intervals... namely don't store intervals, but only start times. Then, have a view that helps with displaying the intervals.

CREATE TABLE intervalStarts
(
  ItemId      int,
  IntervalId  int,
  StartDate   datetime
)

CREATE VIEW intervals
AS
with cte as (
  select ItemId, IntervalId, StartDate,
     row_number() over(partition by IntervalId order by isnull(StartDate,'1753-01-01')) row
  from intervalStarts
)
select c1.ItemId, c1.IntervalId, c1.StartDate,
  dateadd(dd,-1,c2.StartDate) as 'EndDate'
from cte c1
  left join cte c2 on c1.IntervalId=c2.IntervalId
                    and c1.row=c2.row-1

So, sample data might look like:

INSERT INTO intervalStarts
select 1, 1, null union
select 2, 1, '2011-01-16' union
select 3, 1, '2011-01-26' union
select 4, 2, null union
select 5, 2, '2011-01-26' union
select 6, 2, '2011-01-14'

and a simple SELECT * FROM intervals yields:

ItemId | IntervalId | StartDate  | EndDate
1      | 1          | null       | 2011-01-15
2      | 1          | 2011-01-16 | 2011-01-25
3      | 1          | 2011-01-26 | null
4      | 2          | null       | 2011-01-13
6      | 2          | 2011-01-14 | 2011-01-25
5      | 2          | 2011-01-26 | null
chezy525
  • 4,025
  • 6
  • 28
  • 41
  • I think that this structure is highly likely to need a table scan for any query on "what was valid on day *X*". – Damien_The_Unbeliever May 17 '11 at 19:19
  • @Damien, yes, the view will require some more complicated query execution plans. However, I believe an index scan is almost always required for a "what was valid on day X" query on either structure. – chezy525 May 17 '11 at 19:56
  • If you can build a suitable structure where both ValidFrom and ValidTo dates are available, index *seek*s can occur, rather than *scan*s. – Damien_The_Unbeliever May 17 '11 at 20:02
  • Interesting solution. But it allows only sequential intervals. In my case, time gaps are possible. For example EndDate could be 17 and the next StartDate could be 19. – Egor4eg May 18 '11 at 07:38
  • @Egor4eg, the way I handle gaps is to have a fixed "gap" value... e.g. an IntervalId of 0 means "No Interval". In most cases that's implicitly what you're doing anyway. But I guess it really depends on your data. – chezy525 May 18 '11 at 18:33