Remove NA's by keeping all the populated cells in new columns using R

Question

How can I drop all the elements with missing values but instead of deleting entire columns, create columns with just the populated cells? For example getting from this

A   B   C   D
1   NA  2   NA
NA  3   NA  4
NA  5   6   NA

(data1) in order to create a data-set containing only the populated cells, as this

Below I have created a small working example to test a solution.

># Create example dataset (data1)
>data1 <- data.frame(matrix(c(1,NA,2,NA,NA,3,NA,4,NA,5,6,NA),nrow = 3, byrow = T))
>colnames(data1) <- c("A","B","C","D")
     
>print(data1)
 A  B  C  D
 1 NA  2 NA
 NA  3 NA  4
 NA  5  6 NA

> # Create new dataset?

Would each row of the expected output data frame _always_ have two columns, or could there be more or less than 2? — Tim Biegeleisen, Apr 23 '21 at 09:14
@TimBiegeleisen , there are any more. 360 in total actually. There also could be more populated cells in one row than another. Say row 1 has 5 and row 2 has 6. This would make the new dataset to have 5 populated elements in the first row and 1 NA and 6 populated cells in the second row. — user4933, Apr 23 '21 at 09:18
1. Will you always have even number of columns? 2. A column will have exactly one pair to combine always? 3. Do you always combine consecutive columns? If the answer of any of the question is No, can you change your example and give a better example which includes these conditions? — Ronak Shah, Apr 23 '21 at 12:12

score 0 · Accepted Answer · answered Apr 23 '21 at 09:39

Here is a potential solution using akrun's/Valentin's answer from this question.

Let's say the data is

data1 <- data.frame(matrix(c(1,NA,2,NA,NA,3,NA,4,NA,5,NA,NA),nrow = 3, byrow = T))
> data1 
  X1 X2 X3 X4
1  1 NA  2 NA
2 NA  3 NA  4
3 NA  5 NA NA

Then use

df1 <- t(sapply(apply(data1, 1, function(x) x[!is.na(x)]), "length<-", max(lengths(lapply(data1, function(x) x[!is.na(x)])))))

to arrive at

> df1
     X1 X3
[1,]  1  2
[2,]  3  4
[3,]  5 NA

Remove NA's by keeping all the populated cells in new columns using R

1 Answers1