Disclaimer
While searching for an answer, I found this question, but I couldn't find a way to express the solution in SQL: Union of intervals
Background
I'm trying to calculate how long the people in the company I work in are employed. In the database I have (that is already in the company for years and is [sadly] not changeable), each contract is stored as one line. Each line has a lot of information about the employee and the contract, including a contract creation date, a contract rescission date (or infinity, if still active) and the current contract situation ("active" or "deactivated"). There are, however, two problems that are preventing me from simply doing what could seem obvious:
- People can be "multicontratual", so the same person could have multiple active lines at the same time.
- Sometimes, there are some transfers that result in deactivating one of a person's contracts and creating a new contract line. These transfers must not be counted (i.e., I should take into account both the timelines). There is, however, no explicit flag for the transfers existence in the database, so it was defined that "it is a transfer if there was any contract rescission until 60 days before a new contract is created".
When trying to account for the multiple cases that could arise from this scenario (e.g., if the same person had many contracts through the time, then no contracts during more than 60 days, and then some other contracts, then I'd want to start counting from after the "more-than-60-days" period), I found that two rules solve the problem. I need:
- The last contract creation where there was no other contract already active at the time. (this solves the problem 1)
- && there was no other active contract until 60 days before.
To the DB
To solve the problem, I decided to rearrange the rules. I wanted to take all contracts for which there was no other active contract until 60 days before its creation, and then take the "MAX()" of them. So, for example, for the following person, I would say she is active since 1973:
+----------+-----+-----------+-----------+---------------+-----------------+
| CONTRACT | ... | PERSON_ID | STATUS | CREATION_DATE | RESCISSION_DATE |
+----------+-----+-----------+-----------+---------------+-----------------+
| 1 | ... | 1 | deactived | 1973/10/01 | 1999/07/01 |
| 2 | ... | 1 | deactived | 1978/06/01 | 2000/07/01 |
| 3 | ... | 1 | deactived | 2000/08/01 | 2008/06/01 |
| 4 | ... | 1 | active | 2000/08/01 | infinity |
| 5 | ... | 1 | active | 2000/08/01 | infinity |
+----------+-----+-----------+-----------+---------------+-----------------+
I am treating the dates as if they were integers (in fact, they are in the real database). My question is: how could I create a query to take the "1973/10/01"? I.e., how could I get all the "creation_date"s that are distant from (higher than) the others in at least 60, and that are not in the intervals described by the other lines?
[and, anyway, does this seem the best way to solve the problem? (I don't think so)]