3

I'd like to remove the NA values from my columns, merge all columns into four columns, while keeping NA's if there is not 4 values in each row.

Say I have data like this,

df <- data.frame('a' = c(1,4,NA,3),
            'b' = c(3,NA,3,NA),
            'c' = c(NA,2,NA,NA),
            'd' = c(4,2,NA,NA),
            'e'= c(NA,5,3,NA),
            'f'= c(1,NA,NA,4),
            'g'= c(NA,NA,NA,4))
#>    a  b  c  d  e  f  g
#> 1  1  3 NA  4 NA  1 NA
#> 2  4 NA  2  2  5 NA NA
#> 3 NA  3 NA NA  3 NA NA
#> 4  3 NA NA NA NA  4  4

My desired outcome would be,

df.desired <- data.frame('a' = c(1,4,3,3),
                     'b' = c(3,2,3,4),
                     'c' = c(4,2,NA,4),
                     'd' = c(1,5,NA,NA))
df.desired
#>   a b  c  d
#> 1 1 3  4  1
#> 2 4 2  2  5
#> 3 3 3 NA NA
#> 4 3 4  4 NA
Eric Fail
  • 8,191
  • 8
  • 72
  • 128
Neal Barsch
  • 2,810
  • 2
  • 13
  • 39
  • 1
    I don't understand the logic behind your expected outcome. Can you clarify? Values in columns `a`,`b`,`c`,`d` don't seem to match values in the corresponding columns of your original `df`. – Maurits Evers Jan 30 '18 at 11:55
  • 5
    Possible duplicate: [How to move cells with a value row-wise to the left in a dataframe](https://stackoverflow.com/questions/26651606/how-to-move-cells-with-a-value-row-wise-to-the-left-in-a-dataframe); [Move NAs within dataframe in R](https://stackoverflow.com/questions/25869011/move-nas-within-dataframe-in-r/) – Henrik Jan 30 '18 at 11:56
  • this is not pretty `as.data.frame(matrix(t(apply(df,1,function(x){c(x[!is.na(x)],x[is.na(x)])}))[,1:4], nrow=4, dimnames = list(NULL, names(df)[1:4])))` – Eric Fail Jan 30 '18 at 12:08

3 Answers3

2

You could've probably explored a bit more on SO to tweak two answers 1 & 2.

  1. Shifting all the Numbers with NAs
  2. Remove the columns where you've got All NAs

Result:

df <- data.frame('a' = c(1,4,NA,3),
                 'b' = c(3,NA,3,NA),
                 'c' = c(NA,2,NA,NA),
                 'd' = c(4,2,NA,NA),
                 'e'= c(NA,5,3,NA),
                 'f'= c(1,NA,NA,4),
                 'g'= c(NA,NA,NA,4))

df.new<-do.call(rbind,lapply(1:nrow(df),function(x) t(matrix(df[x,order(is.na(df[x,]))])) ))
colnames(df.new)<-colnames(df)

df.new

df.new[,colSums(is.na(df.new))<nrow(df.new)]

Output:

> df.new[,colSums(is.na(df.new))<nrow(df.new)]
     a b c  d 
[1,] 1 3 4  1 
[2,] 4 2 2  5 
[3,] 3 3 NA NA
[4,] 3 4 4  NA
amrrs
  • 6,215
  • 2
  • 18
  • 27
0

I believe there are more efficient ways, anyhow that is my try:

x00=sapply(1:nrow(df),function(x) df[x,][!is.na( df[x,])])
x01=lapply(x00,function(x) x=c(x,rep(NA,7-length(x)-1)))
x02=as.data.frame(do.call("rbind",x01))
x02 <- x02[,colSums(is.na(x02))<nrow(x02)]
Antonios
  • 1,919
  • 1
  • 11
  • 18
0

I have following solution:

df <- data.frame('a' = c(1,4,NA,3),
                 'b' = c(3,NA,3,NA),
                 'c' = c(NA,2,NA,NA),
                 'd' = c(4,2,NA,NA),
                 'e'= c(NA,5,3,NA),
                 'f'= c(1,NA,NA,4),
                 'g'= c(NA,NA,NA,4))
df
x <-list()
for(i in 1:nrow(df)){
  x[[i]] <- df[i,]
  x[[i]] <- x[[i]][!is.na(x[[i]])]
  # x[[i]] <- as.data.frame(x[[i]], stringsAsFactors = FALSE)
  x[[i]] <- c(x[[i]], rep(0, 5 -length(x[[i]])))
}
result <- do.call(rbind, x)
result
Mislav
  • 1,533
  • 16
  • 37