0

I'm attempting to create a dataset similar to how CMS publishes referral data. In short, two physicians are linked if they see the same patient within 30 days of another.

I have a dataset which contains patients, physicians, and appointment dates, e.g.:

df <- data.frame(
  doctor = c("Dr. Who", "Dr. Pepper", "Dr.Bob", "Dr. Strangelove"),
  patient = c("Mickey", "Mickey", "Mickey", "Mickey"),
  date = c("2015-01-15", "2015-01-21", "2015-04-01", "2015-02-18")
)

With the above dataset, I would like to write some R code that would return:

  • Dr. Who, Dr. Pepper (because they see Mickey within 6 days of one another)
  • Dr. Pepper, Dr. Strangelove (they see Mickey within 28 days of one another)

My actual dataset contains many more doctors, patients, and dates. I don't have much of a computer science background, but this seems like it would be a computationally taxing task.

In plain English, the way I would process this problem is:

  1. Collect all patient appointments
  2. For each appointment date, find the difference in days from all other appointment days
  3. Return the doctor pairs for any two appointments that are +/- 30 days from one another.

Please let me know if I can improve my question in any way. Thanks.

alistaire
  • 42,459
  • 4
  • 77
  • 117
mcharl02
  • 128
  • 1
  • 12

1 Answers1

1

You can do it with mapply, which applies a multivariate function elementwise. Here, it loops across the doctor and date columns, subsetting df to the doctors associated with dates within 30 days who are not the same doctor as that appointment. Multiple matches are combined with paste( ... , collapse = ', ').

df$linked_doc <- mapply(
  function(doc, date){paste(
    df[abs(date - df$date) < 30 & doc != df$doctor, 'doctor'], 
    collapse = ', ')}, 
  df$doctor, df$date)

df
#            doctor patient       date               linked_doc
# 1         Dr. Who  Mickey 2015-01-15               Dr. Pepper
# 2      Dr. Pepper  Mickey 2015-01-21 Dr. Who, Dr. Strangelove
# 3          Dr.Bob  Mickey 2015-04-01                         
# 4 Dr. Strangelove  Mickey 2015-02-18               Dr. Pepper

There are other ways to do this, of course. If you have multiple patients, you can split on patient before applying the function.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • Thank you! I think I need to go back to old code and see where else I can utilize mapply. I'm curious - is there another line of code you used to change df$date into the proper class that would work with the solution you gave? I had to do something like: `df$date<-parse_date_time(df$date, "%y%m%d") df$linked_doc <- mapply( function(doc, date){paste( df[abs(date - df$date) < 2592000 & doc != df$doctor, 'doctor'], collapse = ', ')}, df$doctor, df$date)` where 2592000 is 30 days in seconds. When I tried to use difftime(..), I received var type errors – mcharl02 Mar 21 '16 at 19:08
  • 1
    Sorry, I should have included that! You need a date format, not a datetime one. In this case, all you need is `df$date <- as.Date(df$date)`. As your dates are already in ISO format, you don't even need to pass it a parsing string! – alistaire Mar 21 '16 at 19:54
  • Related question - how would you go about splitting on patient first? Is it a matter of nesting an mapply inside an sapply? – mcharl02 Mar 28 '16 at 18:15
  • 1
    There are options, but the simplest way might just be to add it as another variable and another condition: `mapply( function(doc, date, patient){paste( df[abs(date - df$date) < 30 & doc != df$doctor & patient == df$patient, 'doctor'], collapse = ', ')}, df$doctor, df$date, df$patient)` – alistaire Mar 29 '16 at 02:14