PDC calculation in R - Removing Loops

Question

I am just starting to learn R, which has been really useful, and I'm trying to use it to calculate Proportion of Days Covered. This metric has to do with measuring a person's adherence to their medication. Basically, for a given time period you find all the fills of a drug take the fill date and the number of days in the supply to determine which days they were covered for. E.g. if a person gets a 35 day fill on 2/1/2016, they have coverage from 2/1/16 through 3/6/2016. Easy enough.

This gets tricky when they go back for a fill before they run out of coverage on the first fill, you don't double count days (e.g. the person gets their second fill on 3/1/2016, 3/1-3/6 are only counted once).

I've actually written some code that seems to be working properly, but its using FOR loops, which I've come to learn don't work well in R and I'm worried when I start throwing a bunch of data at it.

Here is the first part of the code that builds the test data and initializes some variables:

#Create test data vectors

  Person <- c(rep("Person1",12),rep("Person2",9))
  FillDate <- c("2016-1-1", "2016-2-1", "2016-3-1", "2016-4-1", "2016-5-1", "2016-6-1", "2016-7-1", "2016-8-1", "2016-9-1", "2016-10-1",    "2016-11-1",    "2016-12-1",    "2016-2-1", "2016-3-1", "2016-4-20",    "2016-5-1", "2016-6-1", "2016-7-1", "2016-8-1", "2016-9-1", "2016-10-1")
  DaysSupply <- c(rep("35", 14),    "20",   "5",    "20",   rep("35",   4))

  #Build into data.frame
  PDCTestData <- cbind.data.frame(as.factor(Person),as.Date(FillDate,"%Y-%m-%d"),as.numeric(DaysSupply))
  colnames(PDCTestData) <- c("Person","FillDate","DaysSupply")

#Create start and end dates for overall period
StartDate <- as.Date("2016-01-01")
EndDate <- as.Date("2016-12-31")

#Initialize DaysCoveredList, a vector to hold the list of dates that a person has drug coverage
DaysCoveredList <- NULL

#Initialize DaysCoveredTable, a matrix to count the total number of unique days in the DaysCovered List, by person
DaysCoveredTable <- NULL

and the second part that does the actual work:

#Begin looping through individuals
for(p in levels(PDCTestData$Person)){

  #Begin looping through drug fills
  for(DrugSpan in 1:nrow(PDCTestData[PDCTestData$Person == p,])){

    #Create a sequence of the dates covered by that fill, the sequence starts on the fill date and runs for the number of days in Days Supply, Builds a list of all days covered for that person
    DaysCoveredList <- c(DaysCoveredList,seq.Date(from = PDCTestData[PDCTestData$Person == p,][DrugSpan,]$FillDate, length.out = PDCTestData[PDCTestData$Person == p,][DrugSpan,]$DaysSupply, by = "day"))

  } #Exit drug fill loop

  #Counts the number of unique days covered from the DaysCovredList, with in the start and end of the overall period
  DaysCovered <- length(unique(DaysCoveredList[DaysCoveredList >= StartDate & DaysCoveredList <= EndDate]))

  #Adds the unique count from DaysCovered to the summary DaysCoveredTable
  DaysCoveredTable <- rbind(DaysCoveredTable,cbind(p,DaysCovered))

  #Clear DaysCovered and DaysCovredList
  DaysCovered <- NULL
  DaysCoveredList <- NULL
} #Exit the individual loop

Any help you can offer is appreciated.

Thanks.

Before assuming you need to spend your precious time on optimizing, why not try it with a million rows (or however many you need to deal with) and see if it is slow? Just do `PDCTestData <- PDCTestData[rep(1:21, 100000),]` and see what happens. — , Mar 21 '17 at 21:41

score 0 · Answer 1 · answered Mar 21 '17 at 23:35

library(lubridate)
ptd <- PDCTestData # I get bored writing long variable names

ptd$EndDate <- ptd$FillDate + ptd$DaysSupply
ptd$DrugInterval <- interval(ptd$FillDate, ptd$EndDate)

all_days <- as.Date(StartDate:EndDate, origin = "1970-01-01")

lapply(unique(ptd$Person), function (y) sum(sapply(all_days, function (x) any(x %within% ptd$DrugInterval[ptd$Person==y]))))

No guarantees about speed, but maybe easier to read.

PDC calculation in R - Removing Loops

1 Answers1