My table (@MyTable) is a list of IDs with start dates and end dates (inclusive) that represent an interval of days when the ID appears in a file that is received once per day:
ID Start_Date End_Date
1 10/01/2014 12/15/2014
2 11/05/2014 03/03/2015
3 12/07/2014 12/09/2014
4 04/01/2015 04/15/2015
Each ID appears only once, i.e. only has 1 associated time interval, and intervals between Start_Dates and End_dates can (but not necessarily) overlap across different IDs. I need a SQL query to find the sets of dates where each ID will appear at least once when the files from these sets of dates are merged, in the smallest number of dates as possible. In the table above the solution could be these 2 dates:
File_Date ID(s)
12/07/2015 1,2,3
04/01/2015 4
But for the example any 1 date between ID(3)'s Start_date and End_date & combined with 1 date between ID(4)'s Start_date and End_date would be a solution.
The actual data consists of 10,000 different IDs. The date range of possible file dates is 04/01/2014 - 07/01/2015. Each daily file is very large in size and must be downloaded manually, hence I want to minimize the number I must download to include all IDs.
So far I have a CTE that results in separate rows for all dates between the Start_Date and End_date of each ID:
;WITH cte (ID, d)
AS
(
SELECT
tbl.ID AS ID,
tbl.Start_Date AS d
FROM @MyTable tbl
UNION ALL
SELECT
tbl.ID AS ID,
DATEADD(DAY, 1, cte.d) AS d
FROM cte
INNER JOIN
@MyTable tbl ON cte.ID = tbl.ID
WHERE cte.d < tbl.End_Date
)
SELECT
ID AS ID,
d AS File_Date
FROM cte
ORDER BY ID,d
OPTION (MaxRecursion 500)
Using @MyTable example results are:
ID File_Date
1 10/01/2014
1 10/02/2014
1 10/03/2014
1 etc...
My thinking was to determine the most common File_Date among all the IDs, then pick the next most common File_Date among all the IDs left, and so on...but I'm stuck. To put it in more mathy terms, I am trying to find the fewest sets (File_Dates) that contain all the items (IDs), similar to https://softwareengineering.stackexchange.com/questions/263095/finding-the-fewest-sets-which-contain-all-items, but I don't care about minimizing duplicates. The final results do not have to include which IDs appear in which File_Dates; I just need to know all the File_Dates.
I'm using MS SQL Server 2008.