I have a database of radiology reports which I have mined for incidents of pulmonary nodules. Each patient has a medical record number and each procedure has a unique accession number. Therefore, a MRN can have multiple Accession numbers for difference procedures. Accession numbers are ascending, so if a patient has multiple accession numbers the largest accession number is the latest procedure. I need to:
- Identify the oldest (initial) study
- Find the next study which comes soonest after the inital
- Calculate the time difference between each interval
I believe this problem can be solved using a correlated subquery. However, I am not yet adept enough at SQL to solve this. I have tried self joining the table and finding the max accession for each subquery. Some sample code below to make a dataset:
CREATE TABLE Stack_Example (Rank, Accession1, MRN1, Textbox2, Textbox47,Textbox43,Textbox45,ReadBy,SignedBy,Addendum1,ReadDate,SignedDate,Textbox49,Result,Impression,max_size_nodule, max_nodule_loc, max_nodule_type)
INSERT INTO Stack_Example
VALUES ("10", "33399", "001734", "5/21/1965", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "COMB, YAHAIR", "YES", "12/19/2014 11:48", "12/19/2014 17:50", "TEXT", "Results of Nodules!","Impressions of Nodules","3.0", "right middle lobe","None Found")
INSERT INTO Stack_Example
VALUES ("9", "33104", "001734", "5/21/1965", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "12/21/2013 06:52", "01/21/2014 06:52", "TEXT", "Results of Nodules!","Impressions of Nodules","3.7", "right upper lobe","None Found")
INSERT INTO Stack_Example
VALUES ("9", "33374", "001734", "5/21/1965", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "01/21/2014 08:19", "01/21/2014 06:52", "TEXT", "Results of Nodules!","Impressions of Nodules","2.1", "right lower lobe","None Found")
INSERT INTO Stack_Example
VALUES ("1", "34453", "001734", "5/21/1965", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "03/14/2014 09:14", "03/14/2014 09:14", "TEXT", "Results of Nodules!","Impressions of Nodules","1.4", "left upper lobe","None Found")
INSERT INTO Stack_Example
VALUES ("1", "27122", "80592", "1/14/1984", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "06/26/2013 10:20", "06/26/2013 10:20", "TEXT", "Results of Nodules!","Impressions of Nodules","2.5", "left upper lobe","None Found")
INSERT INTO Stack_Example
VALUES ("1", "27248", "80592", "1/14/1984", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "08/01/2013 06:23", "08/01/2013 06:23", "TEXT", "Results of Nodules!","Impressions of Nodules","4.0", "left lower lobe","None Found")
INSERT INTO Stack_Example
VALUES ("1", "28153", "35681", "03/01/1990", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "09/14/2012 05:00", "09/14/2012 05:00", "TEXT", "Results of Nodules!","Impressions of Nodules","4.0", "left lower lobe","None Found")
INSERT INTO Stack_Example
VALUES ("1", "29007", "35681", "03/01/1990", "CTS", "3341", "ROUTINE", "TUCK, YOURPANTSIN", "PICK, YASELFUP", "YES", "11/16/2012 08:23", "11/16/2012 08:23", "TEXT", "Results of Nodules!","Impressions of Nodules","3.5", "right lower lobe","None Found")
Obviously this is fake data. What I have been trying to do is join the table on itself with a correlated subquery. Like so:
SELECT DISTINCT a.Accession1, a.MRN1, a.ReadDate, p.Accession1, p.ReadDate
FROM Stack_Example as a
INNER JOIN Stack_Example as p on a.MRN1 = p.MRN1
WHERE a.Accession1 =
(SELECT max(Accession1)
FROM Stack_Example as b
WHERE a.MRN1 = b.MRN1 AND
a.Accession1 != p. Accession1)
ORDER BY a.MRN1
Ideally what I would like is a master table with one MRN for each patient on rows and accessions for each MRN as columns (alongside the dates for the accessions etc.). Something like this:
| MRN | Accession (First Follow-up) | Date First Followup |Accession (Second Follow-up)..| Date Second Follow up | etc.
|:-----------|----------------------------:|:-------------------:|
| 001734 | 33374 | ......
| 80592 | 27248 | ......
I believe the subquery I have needs a series of left joins; however, is there a better way of doing this? Some patients have upwards of 7 follow-ups. Appreciate any help and sorry for the long explanation. Hopefully the formatting is okay.