Unique Constraint with Dates

Question

I need to add a unique constraint to an Oracle database table where a foreign key reference can only exist more than once if 2 other columns which are dates don't overlap

e.g.

car_id  start_date  end_date
3       01/10/2012  30/09/2013  
3       01/10/2013  30/09/2014  -- okay no overlap
3       01/10/2014  30/09/2015  -- okay no overlap
4       01/10/2012  30/09/2013  -- okay different foregin key
3       01/11/2013  01/01/2014  -- * not allowed overlapping dates for this car.

Any suggestions? Thanks in advance.

for me best practice would be not to use triggers (and it is rather a common practice to avoid them when really not needed). I would go with a procedure set for set/get row in the table to check whether you have an overlap or not. then, if you have an overlap simply return an error.. — przemo_pl, Oct 20 '15 at 08:09
@chaitanyakvv No. In fact, you should avoid (ab)using triggers. — Lalit Kumar B, Oct 20 '15 at 08:13
Add check constraints instead. This will help for this: [Conditional Unique Constraint](http://stackoverflow.com/questions/866061/conditional-unique-constraint) — Muhammad Muazzam, Oct 20 '15 at 09:36

score 0 · Answer 1 · answered Oct 20 '15 at 11:04

Last time I've seen a requirement and a solution for this, I've seen this:

Create an after statement trigger. In this trigger do a self join on your table like this:

select count(*)
from your_table a 
join your_table b
on a.car_id = b.car_id and
  (a.start_date between b.start_date and b.end_date
   or 
   b.start_date between a.start_date and a.end_date)

If count is zero then everything is ok. If count > 0 then raise an exception and the statement will be rolled back.

OBS: This will not work for tables with > millions of rows and many inserts. It works on small lookup tables or, if you have a big table, with big table and seldom inserts(batch inserts).

TommCatt · Answer 2 · 2015-10-22T01:03:32.013

I take it that cars are tracked through some sort of process and every date records a state change. For example, you show that car #3 underwent a state change on 1 Oct 2012, again on 1 Oct 2013 and again on 1 Oct 2014. The final entry implies that the state changed again on 1 Oct 2015. Where is the entry showing that? Or is the state something that always lasts exactly one year -- making it possible to specify the end of the state as soon as the state begins? If so, then the entry showing the state change on 1 Nov 2013 is simply wrong. But the one-year specification could just be a coincident. You could have just picked simplistic data points for your example data.

Your concern at this point is to strictly identify valid data from accurate data. We design databases (or should) with an emphasis on data integrity or validity. That means we as sharply as possible constrain each piece of data so it is consistent with the specifications of that piece of data.

For example, the car id field is a foreign key -- generally to a table that defines each instance of the car entity. So we know that at least two cars exist with an id of 3 and 4. Else those values could not exist in the example you show.

But what about accuracy or correctness? Suppose in the last entry in your example, the car id 3 should really have been 4? There is no way to tell from within the database. This illustrates the difference. Both the 3 and 4 are valid values and we are able to constrain these to only valid values. But only one is correct -- assuming for a moment they are the only two cars so far defined. The point is, there is no test, no way to constrain the values to the one that is correct. We can check for validity -- not accuracy.

What you are trying to do is check for accuracy with a validity test. You may claim the "no overlaps" restriction becomes a validity check, but this is just a sort of accuracy check. We can sometimes perform tests to signal data anomalies that indicate an inaccuracy exists somewhere. For example, the overlap could mean the end date of 30 Sep 2014 (second row) is wrong or the start date of 1 Nov 2013 (last row) is wrong or both could be wrong. We have no idea which situation this represents. So we can't just prevent the last row from being entered into the database -- it might be correct with the second row being incorrect.

Invalid data is invalid on its own. Suppose an attempt is made to insert a row for car id 15 and there is no entry for car 15 in the CARS table. Then the value 15 is invalid and the row can be (and should be) prevented from ever entering the table. But date period overlaps are caused by wrong data somewhere -- we have no way of knowing exactly where. We can signal the inconsistency to the user or make a log entry somewhere to have someone look into the problem, but we shouldn't reject the row that "caused" the overlap when it could very well be the existing row that contains the wrong data.

Accuracy, like the data itself, originates from outside the database. If we are lucky enough to be able to detect instances of inaccuracy, the solution also lies outside the database. The best we can do is flag it and have someone investigate to determine what data is correct and what is incorrect and (hopefully) correct the inaccuracy.

UPDATE: Having discussed a bit the concepts of data integrity and accuracy and the differences between them, here is a design idea that may be an improvement.

Note: this is based on the assumption that the date ranges form an unbroken range for each car from the first entry to the last. That is, there are no gaps.

Simple: do away with the end_date field altogether. The first entry for a car sets up the current state of that car with no end date specified. The clear implication is that the state continues indefinitely into the future until the next state change is inserted. The start date of the second state change then becomes the end date of the first state change. Continue as needed.

create table Car_States(
    Car_ID     int not null,
    Start_Date date not null,
    ...,      -- other info
    constraint FK_Car_States_Car foreign key( Car_ID )
        references Cars( ID ),
    constraint PK_Car_States primary key( Car_ID, Start_Date )
);

Now let's look at the data

car_id  start_date
3       01/10/2012
3       01/10/2013  -- okay no overlap
3       01/10/2014  -- okay no overlap
4       01/10/2012  -- okay different foreign key
3       01/11/2013  -- What does this mean???

Before that final row was entered, here is how the data is read for the car with id = 3: Car 3 started life in a particular state on 1 Oct 2012, changed to another state on 1 Oct 2013 and then again on 1 Oct 2014 where it remains.

Now the final row is entered: Car 3 started life in a particular state on 1 Oct 2012, changed to another state on 1 Oct 2013, changed to another state on 1 Nov 2013 and then again on 1 Oct 2014 where it remains.

As we can see, we are able to absorb the new data easily into the model. The design makes it impossible to have gaps or overlaps.

But is this really an improvement? What if the last entry was a mistake -- possibly meant to be for a different car instead of car 3? Or the wrong dates were entered. The new model just accepted the incorrect data with no complaints and we proceed not knowing we have incorrect data in the table.

This is true. But how is it any different from the original scenario? The last row represents "wrong" data. The question was, "How do I prevent this?" The answer is, in both cases, "You can't! Sorry." The best either design can do is detect the discrepancy and bring it to someone's attention.

One might think that with the original design, with the start and end dates in the same row, it is easy to determine if the new period overlapped any previously defined period. But this is also easily determined with the start-date-only design. What is important is that the test for such possible inaccuracies being discovered before the data is written to the table primarily lies with the application, not just within the database.

It is up to the users and/or some automated process to verify new and existing data and determine if any inaccuracies exist. The advantage of using only one date is that, after displaying a warning message with an "Are you sure?" response, the new record can be inserted and the operation is finished. With two dates, other records must be found and their dates resynched to match the new period.

The "no overlap" is a constraint just like a unique constraint. Would you say duplicate data is also just an matter of accuracy and you shouldn't try to avoid it in the database? — , Oct 20 '15 at 20:41
No, they are completely different. A unique constraint defines a key field(s) and is generally necessary of the proper functioning of the relational database. A "no overlap" is a design-specific specification of the model. Yes, it may be necessary to the proper functioning of the *model*, but the database can work perfectly well with overlaps all over the place. Besides, I'm not saying accuracy is to be avoided, I'm saying that you can't get there from here. Quick, go through all the sample rows provided by the OP and test each field for accuracy. Keep us informed of your progress. — TommCatt, Oct 21 '15 at 14:31
Well to me, a constraint "*there can only be one reservation for a single room for any given time*" is the essentially the same as "*there can only be one row with this column value*" - but apparently we have different views on this. The transaction that tries to insert the "duplicate" gets a slap on the finger — , Oct 21 '15 at 14:34
From the pov of the users, yes, they are the same thing. But they only see the model. Inside the database, where we deal only with the physical, they are quite different, although one of our tasks is to make them appear the same to the users -- present them with a consistent picture of the model. Makes their job easier. Alas, makes ours more difficult. — TommCatt, Oct 21 '15 at 15:25

Unique Constraint with Dates

2 Answers2