0

I am storing intervals where From < To for each entry. If I build an index on "From", then I can get the minimum and maximum entries efficiently when inserting a new record to do a validation check. However, this doesn't help if there were a big gap somewhere in these entries where a given record could've easily fit. For example:

[3, 5]
[8, 9]

If I want to insert [6,7] or [1,2] above, what is the best way to check?

Just wondering if there's an efficient way of ensuring that a record doesn't overlap with any of the existing entries without comparing against all of them.

EDIT: I am looking for a c#.Net solution. Sql server is just being used as a data store. I thought it important to mention since querying to check existing values is a part of it.

Riz
  • 6,486
  • 19
  • 66
  • 106
  • How are you currently checking? What object are you using to store `[3, 5]`. Assuming a custom class, you could write some helper methods on it that take in another interval and return a value, like `Contains` (the other interval is completely contained within this one) and `OverlapsWith` (the other interval overlaps this interval. But without your current code, it's a little hard to help. Or is this stricly a SQL table, and you're looking for query help? – Rufus L Apr 01 '21 at 00:57
  • 1
    The more hints you can give SQL Server the better. I have found in the past that adding a table check constraint `EndDate > StartDate` made a difference to the query plan and improved performance. – Nick.Mc Apr 01 '21 at 00:58
  • To clarify, are you looking for a T-SQL solution or a C# solution? Any C# solution may require some care to ensure that it runs efficiently. – Nick.Mc Apr 01 '21 at 01:02
  • @Nick.McDermaid I am ideally looking for a c# solution since that's where all the work will be done. – Riz Apr 01 '21 at 01:55
  • Beware of getting engaged in this holy war :) https://stackoverflow.com/questions/1473624/business-logic-in-database-versus-code Note that a number of people simply assumed you would put the logic in the DB. It's certainly a valid option. Databases are much more than data stores. But I 100% understand your preference to do this in C#. Just keep in mind that for very large datasets it can sometime be difficult to implement these thing efficiently in the application layer. – Nick.Mc Apr 01 '21 at 02:00
  • Also keep in mind that databases / SQL are declarative. You tell it what to do and it works out the fastest way to do it based on support structures (indexes), data statistics etc. This can be really powerful, i.e. you can get large performance improvements without changing your actual code. Also there is no requirement to code loops etc. in fact if you do code a loop in SQL you're probably doing it wrong. OK I shuddup now. – Nick.Mc Apr 01 '21 at 02:04
  • I get what you're saying. But part of the requirement is to check the records being inserted in app layer against records in db and flag an error if it will overlap with any existing layers. I suppose I could just catch an appropriate constraint exception in app layer but it'd be nice to have even an sql procedure that checks that. Do you mind providing some code, even if it is sql since you mentioned it shouldn't have any loops? – Riz Apr 01 '21 at 02:17
  • @Riz `catch an appropriate constraint exception in app` - do you know how to create an efficient table constraint that would prevent overlapping intervals from being added to the table? Enlighten us please, then, because I don't know how it could be implemented, in MS SQL Server anyway. – Roger Wolf Apr 01 '21 at 02:57
  • @RogerWolf No I do not. I was going by what Nick was suggesting. – Riz Apr 01 '21 at 03:01
  • @RogerWolf I agree there is no constraint that will do this. I'm just spitballing here to clarify that it doesn't just have to be done in the app layer. Anyway you look to have it all covered in your answer – Nick.Mc Apr 02 '21 at 00:33

1 Answers1

0

Yes, there is a way. Suppose your table looks like this:

drop table if exists #Intervals;
go
-- Setup
create table #Intervals (
    Id int identity(1,1) primary key,
    StartValue int not null,
    EndValue int not null,
    check (StartValue < EndValue)
);
go
create index [IX_Intervals_StartValue_EndValue] on #Intervals (StartValue, EndValue);
go

insert into #Intervals (StartValue, EndValue)
values
    (3,5),
    (8,9);
go

Now, you can insert your data (in bulk, if necessary) with this simple check:

-- Before
select * from #Intervals;

-- Inserting new records
insert into #Intervals (StartValue, EndValue)
select s.*
from (values
    (6,7),
    (1,2),
    (4,6) -- Should not be inserted, overlaps with pre-existing (3, 5)
) s (StartValue, EndValue)
where not exists (
    select 0 from #Intervals i
    where i.StartValue < s.EndValue
        and i.EndValue > s.StartValue
);

-- After
select * from #Intervals;

The only thing to look out for is that your new intervals should not overlap with each other; that needs to be checked before the insert into the table.

Regarding the efficiency, the index I used might not be the most optimal one. You will need to monitor index usage and suggestions when your table will become sufficiently large. A viable alternative might be having 2 separate indices on each of the interval ends.

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33
  • will this loop do comparison check in logN time for inserts? – Riz Apr 01 '21 at 01:58
  • @Riz, if the index on `(StartValue, EndValue)` will be used by the query, then yes - B-tree gives you log(O). generally speaking. – Roger Wolf Apr 01 '21 at 02:36