I have a data frame "comp". Sample for reference:
comp <- data.frame(A=c(1:5), B=c(1,0,1,0,0), C=c(5,2,0,0,NA), D=c(1,3,1,NA,0))
A B C D
1 1 1 5 1
2 2 0 2 3
3 3 1 0 1
4 4 0 0 NA
5 5 0 NA 0
I'd like to iterate a for loop over every column (excluding the first two). Basically the loop is supposed to print a particular string or NA depending on both the value in that cell and the value in column 2 of that row. The rules for what to print in C are:
- If C is positive and B is 1: "Ysnp, Yphen"
- If C is positive and B is 0: "Ysnp, Nphen"
- If C is 0 and B is 1: "Nsnp, Yphen"
- If C is 0 and B is 0: "Nsnp, Nsnp"
- If C is NA: NA
These same rules would also apply to column D (just replace C with D in the above rules). For my sample data it would look like this:
A B C D
1 1 1 "Ysnp, Yphen" "Ysnp, Yphen"
2 2 0 "Ysnp, Nphen" "Ysnp, Nphen"
3 3 1 "Nsnp, Yphen" "Ysnp, Yphen"
4 4 0 "Nsnp, Nphen" NA
5 5 0 NA "Nsnp, Nphen"
My real data set has 50+ columns, so applying the for loop to each one is tedious. This is what I tried:
sapply(comp[,-(1:2)], function(snp) {
for (i in 1:nrow(comp)){
if (comp$snp[i]!=0 & !is.na(comp$snp[i])){
if (comp[i, 2]==1) comp$snp[i] <- "Ysnp, Yphen"
else comp$snp[i] <- "Ysnp, Nphen"
}
else if (comp$snp[i]==0 & !is.na(comp$snp[i])){
if (comp[i, 2]==1) comp$snp[i] <- "Nsnp, Yphen"
else comp$snp[i] <- "Nsnp, Nphen"
}
else comp$snp[i] <- NA
}
})
However when I run this loop I get the following error:
Error in if (comp$snp[i] != 0 & !is.na(comp$snp[i])) { :
argument is of length zero
I've checked that my data frame does not contain any NULL
values, so I'm not sure why the loop is generating this error. I also tried replacing comp$snp[i]
with comp[i, snp]
throughout the loop, or using apply
instead of sapply
, but that didn't solve the problem.