I’m trying to replace Us (or NAs, easy to lead in Us as NAs) in my dataframe with the values that are in the column above or below them. I.e.
0 1 0 1
U U U U
0 1 1 0
Would become
0 1 0 1
0 1 U U
0 1 1 0
I have a for loop to do this, which works on subsets of the data
for(i in 2:((NROW(Sample_table))-1)) {
for(j in 3:NCOL(Sample_table)) {
if((Sample_table[i,j]=="U")&(Sample_table[(i-1),j]==Sample_table[(i+1),j])){
Sample_table[i,j] <- Sample_table[(i+1),j]
}
}
}
(doesn't start at 1:1 because first couple of rows/columns contain positions/names). However, my final dataset is 152 columns and ~6 million rows, so the for loop isn’t a good solution (attempted to do this, ran for a week without finishing). I’ve tried using apply, but can’t work out how to get it to refer to other rows, I’ve tried using ifelse, but can only get it to work within a for loop. Any help or suggestions?
EDIT ###I thought Maurits had solved it below, but when I apply it to a larger dataframe, it doesn't give the expected output:
df <- read.table(text =
"0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
", header = F)
> df
V1 V2 V3 V4 V5 V6 V7 V8
1 0 1 0 1 0 1 1 0
2 U U U U 1 0 1 1
3 0 1 1 0 0 1 0 1
4 0 1 0 1 0 1 1 0
5 U U U U 1 0 1 1
6 0 1 1 0 0 1 0 1
> df2 <- as.data.frame(sapply(df, function(x) replace(x, x[1] == x[3] & x[2]
== "U", x[1])))
> df2
V1 V2 V3 V4 V5 V6 V7 V8
1 1 1 1 2 0 1 1 0
2 1 1 3 3 1 0 1 1
3 1 1 2 1 0 1 0 1
4 1 1 1 2 0 1 1 0
5 1 1 3 3 1 0 1 1
6 1 1 2 1 0 1 0 1
EDIT 2
Compared the methods: apply is quickest (that gets the right answer):
devtools::install_github("olafmersmann/microbenchmarkCore")
devtools::install_github("olafmersmann/microbenchmark")
library(microbenchmark)
mbm <- microbenchmark("apply_wrong_version" = {df <- read.table(text =
"0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
", header = F)
df2 <- as.data.frame(sapply(df, function(x) replace(x, x[1] == x[3] & x[2]
== "U", x[1])))
df2},"forloop" = {df <- read.table(text =
"0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
", header = F)
for(i in 2:((NROW(df))-1)) {
for(j in 1:NCOL(df)) {
if((df[i,j]=="U")&(df[(i-1),j]==df[(i+1),j])){
df[i,j] <- df[(i+1),j]
}
}
}
},"na.locf_version" = {mat=read.table(text =
"0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
", header = F)
mat1=mat
mat1[mat1=='U']=NA
mask=zoo::na.locf(mat1)==zoo::na.locf(mat1,fromLast=T)
mat[mask]=zoo::na.locf(mat1,fromLast=T)[mask]
mat},"apply_version"= {df <- read.table(text =
"0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
0 1 0 1 0 1 1 0
U U U U 1 0 1 1
0 1 1 0 0 1 0 1
", header = F)
df[]<-apply(df, 2, function(x){
#find rows with U
us<-which(x=="U" )
#replace U with value above (if above=below)
x[us]<-ifelse(x[us-1]==x[us+1], x[us-1], "U")
return(x)
})
})
mbm
expr min lq mean median uq max neval cld
apply_wrong_version 671.605 821.334 979.1732 910.816 1020.840 4364.250 100 a
forloop 11809.985 13516.258 14523.5789 14059.863 15238.531 22556.858 100 d
na.locf_version 3754.275 4380.448 5042.3309 4631.510 5314.573 9295.415 100 c
apply_version 986.470 1209.878 1476.4378 1321.878 1492.742 8167.513 100 b