A lot comes together in this question. First off all I would like to segment the data by column c. The subsets are given by the factor c: the levels are 1 to 4. So 4 distinct segments. Next I have two columns. Column a and b. I would like to replace the NA's with the maximum value of each segment specific column. So for example, NA at row 3 and column 'a', this would be 30. (b,3) would be 80, (b,8) would be 50 and (a, 5) would be 80.
I have created the code below that does the job, but now I need to make it automatic (like a for loop) for all segments and columns. How could I do this?
a <- c(10,NA,30,40,NA,60,70,80,90,90,80,90,10,40)
b <- c(80,70,NA,50,40,30,20,NA,0,0,10,69, 40, 90)
c <- c(1,1,1,2,2,2,2,2,3,3,3,4,4,4)
a b c
1: 10 80 1
2: NA 70 1
3: 30 NA 1
4: 40 50 2
5: NA 40 2
6: 60 30 2
7: 70 20 2
8: 80 NA 2
9: 90 0 3
10: 90 0 3
11: 80 10 3
12: 90 69 4
13: 10 40 4
14: 40 90 4
mytable <- data.table(a,b,c)
mytable[which(is.na(mytable[c == 1][,1, with = FALSE]) == TRUE),1] <- max(mytable[c==1,1], na.rm = TRUE)
Unfortunately, this try results in an error:
for(i in unique(mytable$c)){
for(j in unique(c(1:2))){
mytable[which(is.na(mytable[c == i][,j, with = FALSE]) == TRUE),j, with = FALSE] <- max(mytable[c==i][,j, with = FALSE], na.rm = TRUE)
}
}
Error in [<-.data.table
(*tmp*
, which(is.na(mytable[c == i][, j, with = FALSE]) == :
unused argument (with = FALSE)
Surprisingly, this results in an error as well:
for(i in unique(mytable$c)){
for(j in unique(c(1:2))){
mytable[which(is.na(mytable[c == i][,j]) == TRUE),j] <- max(mytable[c==i,j], na.rm = TRUE)
}
}
Error in [.data.table
(mytable, c == i, j) :
j (the 2nd argument inside [...]) is a single symbol but column name 'j' is not found. Perhaps you intended DT[,..j] or DT[,j,with=FALSE]. This difference to data.frame is deliberate and explained in FAQ 1.1.