0

I have a dataset comprised of clinics with each clinic is comprised of doctors, performing procedures on patients.

I have written to perform analyses on the dataset filtering for clinic lists or doctor lists (a simple one is below):

num.of <- function(x.doctor, x.clinic){


if (!missing(x.clinic)){
    df_filter <- filter(df_clean, clinic == x.clinic)
  }
  if (!missing(x.doctor)) {
    df_filter <- filter(df_clean, doctor == x.doctor)
  }
  num_doctor <- length(unique(df_filter$doctor))
  num_surveys <- nrow(df_filter)
  num_procedure <- length(unique(df_filter$PPID))
  result <- setNames(c(num_doctor, num_surveys, num_procedure), c("num_doctor", "num_surveys", "num_procedure"))
  return(result)
}

I am attempting to call on these functions with either a list of doctors or a list of clinics:

sapply(doctor_list, num.of, x.clinic = NULL)

However, the function only works when the 'first' argument is passed through, i.e. the function above does not work, but this does:

sapply(clinic_list, num.of, x.doctor = NULL)

If the arguments are reversed when writing the initial function, the opposite of the above examples is true.

The functions are fed only one set of arguments at a time: Either a list for x.doctor or a list for x.clinic.

How can I rewrite my functions please so that apply works x.clinic and in a separate function call for x.doctor?

Thank you!

isaacsultan
  • 3,784
  • 3
  • 16
  • 29

1 Answers1

0

Try this:

num.of <- function(x, data, type = c("doctor", "clinic")) {
  type <- match.arg(type)
  df_filter <-
    if (type == "doctor") {
      filter(data, doctor == x)
    } else {
      filter(data, clinic == x)
    }
  num_doctor <- length(unique(df_filter$doctor))
  num_surveys <- nrow(df_filter)
  num_procedure <- length(unique(df_filter$PPID))
  result <- setNames(c(num_doctor, num_surveys, num_procedure), c("num_doctor", "num_surveys", "num_procedure"))
  return(result)
}

This enables an explicit and clear call:

sapply(doctor_list, num.of, data = df_clean, type = "doctor")
sapply(clinic_list, num.of, data = df_clean, type = "clinic")

I took the liberty of helping with a scope breach: accessing df_clean from inside the function may work but can present problems in the future. It makes the function very context-dependent and inflexible in the presence of multiple datasets. Even if you are 100% certain you will always always always have df_clean in your calling (or global) environment for this case, it's a good habit (among "Best Practices TM").

If this doesn't work, then you might need to make a more reproducible example so that we can actually test the function. Since you may not want to include actual data, it makes things incredibly easier for everyone else if you make it generic-as-ever, with simple names and simple example data.

Community
  • 1
  • 1
r2evans
  • 141,215
  • 6
  • 77
  • 149