I'm attempting to create a dataset similar to how CMS publishes referral data. In short, two physicians are linked if they see the same patient within 30 days of another.
I have a dataset which contains patients, physicians, and appointment dates, e.g.:
df <- data.frame(
doctor = c("Dr. Who", "Dr. Pepper", "Dr.Bob", "Dr. Strangelove"),
patient = c("Mickey", "Mickey", "Mickey", "Mickey"),
date = c("2015-01-15", "2015-01-21", "2015-04-01", "2015-02-18")
)
With the above dataset, I would like to write some R code that would return:
- Dr. Who, Dr. Pepper (because they see Mickey within 6 days of one another)
- Dr. Pepper, Dr. Strangelove (they see Mickey within 28 days of one another)
My actual dataset contains many more doctors, patients, and dates. I don't have much of a computer science background, but this seems like it would be a computationally taxing task.
In plain English, the way I would process this problem is:
- Collect all patient appointments
- For each appointment date, find the difference in days from all other appointment days
- Return the doctor pairs for any two appointments that are +/- 30 days from one another.
Please let me know if I can improve my question in any way. Thanks.