Merge data.frame columns on set number of columns removing na's unless not enough values in row

Question

I'd like to remove the NA values from my columns, merge all columns into four columns, while keeping NA's if there is not 4 values in each row.

Say I have data like this,

df <- data.frame('a' = c(1,4,NA,3),
            'b' = c(3,NA,3,NA),
            'c' = c(NA,2,NA,NA),
            'd' = c(4,2,NA,NA),
            'e'= c(NA,5,3,NA),
            'f'= c(1,NA,NA,4),
            'g'= c(NA,NA,NA,4))
#>    a  b  c  d  e  f  g
#> 1  1  3 NA  4 NA  1 NA
#> 2  4 NA  2  2  5 NA NA
#> 3 NA  3 NA NA  3 NA NA
#> 4  3 NA NA NA NA  4  4

My desired outcome would be,

df.desired <- data.frame('a' = c(1,4,3,3),
                     'b' = c(3,2,3,4),
                     'c' = c(4,2,NA,4),
                     'd' = c(1,5,NA,NA))
df.desired
#>   a b  c  d
#> 1 1 3  4  1
#> 2 4 2  2  5
#> 3 3 3 NA NA
#> 4 3 4  4 NA

I don't understand the logic behind your expected outcome. Can you clarify? Values in columns `a`,`b`,`c`,`d` don't seem to match values in the corresponding columns of your original `df`. — Maurits Evers, Jan 30 '18 at 11:55
Possible duplicate: [How to move cells with a value row-wise to the left in a dataframe](https://stackoverflow.com/questions/26651606/how-to-move-cells-with-a-value-row-wise-to-the-left-in-a-dataframe); [Move NAs within dataframe in R](https://stackoverflow.com/questions/25869011/move-nas-within-dataframe-in-r/) — Henrik, Jan 30 '18 at 11:56
this is not pretty `as.data.frame(matrix(t(apply(df,1,function(x){c(x[!is.na(x)],x[is.na(x)])}))[,1:4], nrow=4, dimnames = list(NULL, names(df)[1:4])))` — Eric Fail, Jan 30 '18 at 12:08

score 2 · Answer 1 · answered Jan 30 '18 at 12:01

You could've probably explored a bit more on SO to tweak two answers 1 & 2.

Shifting all the Numbers with NAs
Remove the columns where you've got All NAs

Result:

df <- data.frame('a' = c(1,4,NA,3),
                 'b' = c(3,NA,3,NA),
                 'c' = c(NA,2,NA,NA),
                 'd' = c(4,2,NA,NA),
                 'e'= c(NA,5,3,NA),
                 'f'= c(1,NA,NA,4),
                 'g'= c(NA,NA,NA,4))

df.new<-do.call(rbind,lapply(1:nrow(df),function(x) t(matrix(df[x,order(is.na(df[x,]))])) ))
colnames(df.new)<-colnames(df)

df.new

df.new[,colSums(is.na(df.new))<nrow(df.new)]

Output:

> df.new[,colSums(is.na(df.new))<nrow(df.new)]
     a b c  d 
[1,] 1 3 4  1 
[2,] 4 2 2  5 
[3,] 3 3 NA NA
[4,] 3 4 4  NA

score 0 · Answer 2 · answered Jan 30 '18 at 12:12

I believe there are more efficient ways, anyhow that is my try:

x00=sapply(1:nrow(df),function(x) df[x,][!is.na( df[x,])])
x01=lapply(x00,function(x) x=c(x,rep(NA,7-length(x)-1)))
x02=as.data.frame(do.call("rbind",x01))
x02 <- x02[,colSums(is.na(x02))<nrow(x02)]

score 0 · Answer 3 · answered Jan 30 '18 at 12:23

I have following solution:

df <- data.frame('a' = c(1,4,NA,3),
                 'b' = c(3,NA,3,NA),
                 'c' = c(NA,2,NA,NA),
                 'd' = c(4,2,NA,NA),
                 'e'= c(NA,5,3,NA),
                 'f'= c(1,NA,NA,4),
                 'g'= c(NA,NA,NA,4))
df
x <-list()
for(i in 1:nrow(df)){
  x[[i]] <- df[i,]
  x[[i]] <- x[[i]][!is.na(x[[i]])]
  # x[[i]] <- as.data.frame(x[[i]], stringsAsFactors = FALSE)
  x[[i]] <- c(x[[i]], rep(0, 5 -length(x[[i]])))
}
result <- do.call(rbind, x)
result

Merge data.frame columns on set number of columns removing na's unless not enough values in row

3 Answers3