2
data=data.frame("StudentID"=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
"Time"=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
"Group"=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0),
"Class"=c(1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1),
"Test"=c(NA,1,0,NA,1,1,1,1,0,0,1,0,1,0,1,1,1,0,0,0),
"Score"=c(0,1,1,0,1,NA,0,1,NA,0,NA,1,1,1,1,0,0,1,1,1),
"P"=c(NA,3,1,1,1,1,2,NA,3,1,3,3,2,2,2,NA,NA,1,2,2))

Group-P are categorical.

data1: I wish to calculate the modes of Test, Score, and P separately by Group and Class and then impute the modes for only Time = 1.

data2: As a separate step I wish to create data2; data2 takes data1 and for any missing values at Time T where T > 1, copy down the value above for each group for the variables Test and Score.

In hopes of reaching a data.table solution!

bvowe
  • 3,004
  • 3
  • 16
  • 33
  • I updated the second case as well. At first, I didn't know what you meant by copy down. – akrun Mar 18 '20 at 22:52

1 Answers1

1

We can use the Mode function from here

Mode <- function(x) {
  ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

and then loop over the columns of interest to calculate the 'Mode' by 'group' and replace where there are NA and the 'Time' is 1

library(data.table)
nm1 <- c("Test", "Score", "P")
setDT(data)[ , (nm1) := lapply(.SD, function(x) 
    replace(x, is.na(x) & Time == 1, Mode(x))), by = .(Group), .SDcols = nm1]

For the second case, it would be

library(zoo)
nm2 <- c("Test", "Score")
data[Time  > 1,  (nm2) := lapply(.SD, na.locf0), .SDcols = nm2, by = Group]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think it would be good practice to give credit to reused answers: https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode – s_baldur Mar 20 '20 at 16:12