2

Apologises for a semi 'double post'. I feel I should be able to crack this but I'm going round in circles. This is on a similar note to my previously well answered question:

Within ID, check for matches/differences

test <- data.frame(
ID=c(rep(1,3),rep(2,4),rep(3,2)),
DOD = c(rep("2000-03-01",3), rep("2002-05-01",4), rep("2006-09-01",2)),
DOV = c("2000-03-05","2000-06-05","2000-09-05",
    "2004-03-05","2004-06-05","2004-09-05","2005-01-05",
    "2006-10-03","2007-02-05")
)

What I want to do is tag the subject whose first vist (as at DOV) was less than 180 days from their diagnosis (DOD). I have the following from the plyr package.

ddply(test, "ID", function(x) ifelse( (as.numeric(x$DOV[1]) - as.numeric(x$DOD[1])) < 180,1,0))

Which gives:

  ID V1
1  A  1
2  B  0
3  C  1

What I would like is a vector 1,1,1,0,0,0,0,1,1 so I can append it as a column to the data frame. Basically this ddply function is fine, it makes a 'lookup' table where I can see which IDs have a their first visit within 180 days of their diagnosis, which I could then take my original test and go through and make an indicator variable, but I should be able to do this is one step I'd have thought.

I'd also like to use base if possible. I had a method with 'by', but again it only gave one result per ID and was also a list. Have been trying with aggregate but getting things like 'by has to be a list', then 'it's not the same length' and using the formula method of input I'm stumped 'cbind(DOV,DOD) ~ ID'...

Appreciate the input, keen to learn!

Community
  • 1
  • 1
nzcoops
  • 9,132
  • 8
  • 41
  • 52

2 Answers2

2

After wrapping as.Date around the creation of those date columns, this returns the desired marking vector assuming the df named 'test' is sorted by ID (and done in base):

 # could put an ordering operation here if needed
 0 + unlist(      # to make vector from list and coerce logical to integer
        lapply(split(test, test$ID),       # to apply fn with ID
          function(x) rep(                 # to extend a listwise value across all ID's
                   min(x$DOV-x$DOD) <180,  # compare the minimum of a set of intervals
                   NROW(x)) ) )           
11 12 13 21 22 23 24 31 32                 # the labels
 1  1  1  0  0  0  0  1  1                 # the values
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

I have added to data.frame function stringsAsFactors=FALSE:

test <- data.frame(ID=c(rep(1,3),rep(2,4),rep(3,2)),
         DOD = c(rep("2000-03-01",3), rep("2002-05-01",4), rep("2006-09-01",2)),
         DOV = c("2000-03-05","2000-06-05","2000-09-05","2004-03-05",  
          "2004-06-05","2004-09-05","2005-01-05","2006-10-03","2007-02-05")
         , stringsAsFactors=FALSE)

CODE

test$V1 <- ifelse(c(FALSE, diff(test$ID) == 0), 0, 
                   1*(as.numeric(as.Date(test$DOV)-as.Date(test$DOD))<180))
test$V1 <- ave(test$V1,test$ID,FUN=max)
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27