--code to create a minimally reproduceable example
CREATE TABLE mre (
recid int,
caseID int,
t1 date,
t2 date,
t3 date,);
go
INSERT INTO mre (recid, caseID, t1, t2, t3)
VALUES (1, 1,'2018-04-01','2019-05-05','2019-05-05'),
(2, 2,'2006-10-12','2007-03-10',NULL),
(3, 3,'2018-05-05','2019-08-07','2019-07-09'),
(4, 4,'2005-07-01','2020-10-10',NULL),
(5, 5,'2018-08-25','2022-01-01','2021-12-12'),
(6, 5,'2018-08-25','2022-01-01','2021-12-12'),
(7, 6,'2020-09-20','2022-01-13','2022-01-04')
go
--here is the record set
select*
from mre
go
--create a table mimicking the desired query output
create table qryout (t3 date, perscount int, eventcount int)
go
insert into qryout (t3,perscount,eventcount )
VALUES('2019-05-05',4, 1),
('2019-07-09',3, 2),
('2021-12-12',2, 4),
('2022-01-04',1, 5)
--qry output should look like this
select*
from qryout
In the minimally reproduceable example, I have a record set with seven records, each with three time points. ‘recid’ is the unique record identifier, ‘caseid’ uniquely identifies an individual. ‘t1’ is the time someone entered a population, ‘t2’ is the time someone left a population and ‘t3’ is the time of an event occurring involving a person. If an event did not occur ‘t3’ contains no data.
For each event date, I want to know how many distinct individuals were in the total population at the time by counting those who had already joined by t3 (at or before t3) and had not left before t3, though they could leave on t3 and still be counted. Note that person number 5 happened to have two events on the same date. This person should not be counted twice in the number of people that were around at this time point.
So for the first instance of t3, involving person number 1, people 1,2,3 & 4 were in the sample at the time. Person 6 joined too late to be counted and person 2 had already left, so a total of four people were around when this event occurred.
For the second instance, subjects 3,4, & 5 were in the sample, a total of 3, and for the remaining two time points there were 2 and 1 respectively.
Also, at each time point I would like to know how many events had occurred so far. I can have multiple events occurring on the same day for the same person as stated above. So, the events involving person 5 need to be both counted in the total number of events.
I need a query which can output this summary. Computational efficiency is a consideration, as the datasets involved are fairly large (3-4M observations). I feel this may be a job for a count within a case statement, but have been unable to operationalise this. Currently I'm doing this in SQL server 2008. I am migrating the project to an SQL 2019 environment in the near future, so if there is a better solution using this version I'd be interested in looking at this as well. Any help gratefully received.