Combine column to remove NA's

Question

I have some columns in R and for each row there will only ever be a value in one of them, the rest will be NA's. I want to combine these into one column with the non-NA value. Does anyone know of an easy way of doing this. For example I could have as follows:

data <- data.frame('a' = c('A','B','C','D','E'),
                   'x' = c(1,2,NA,NA,NA),
                   'y' = c(NA,NA,3,NA,NA),
                   'z' = c(NA,NA,NA,4,5))

So I would have

'a' 'x' 'y' 'z'  
 A   1   NA  NA  
 B   2   NA  NA  
 C  NA   3   NA  
 D  NA   NA  4  
 E  NA   NA  5

And I would to get

 'a' 'mycol'  
  A   1  
  B   2  
  C   3  
  D   4  
  E   5

The names of the columns containing NA changes depending on code earlier in the query so I won't be able to call the column names explicitly, but I have the column names of the columns which contains NA's stored as a vector e.g. in this example cols <- c('x','y','z'), so could call the columns using data[, cols].

Any help would be appreciated.

Thanks

score 72 · Accepted Answer · answered Apr 15 '18 at 09:32

A dplyr::coalesce based solution could be as:

data %>% mutate(mycol = coalesce(x,y,z)) %>%
         select(a, mycol)
#   a mycol
# 1 A     1
# 2 B     2
# 3 C     3
# 4 D     4
# 5 E     5

Data

data <- data.frame('a' = c('A','B','C','D','E'),
                 'x' = c(1,2,NA,NA,NA),
                 'y' = c(NA,NA,3,NA,NA),
                 'z' = c(NA,NA,NA,4,5))

Sven Hohenstein · Answer 2 · 2016-10-09T15:31:58.073

23

You can use unlist to turn the columns into one vector. Afterwards, na.omit can be used to remove the NAs.

cbind(data[1], mycol = na.omit(unlist(data[-1])))

   a mycol
x1 A     1
x2 B     2
y3 C     3
z4 D     4
z5 E     5

edited Oct 09 '16 at 15:31

answered Jan 28 '13 at 13:56

Sven Hohenstein

80,497
17
145
168

This solution only works if the columns are ordered in function of the available data. The first column is not paired with the right value with other datasets like: `data <- data.frame('a' = c('A','B','C','D','E'),'x' = c(NA,NA,3,NA,NA),'y' = c(1,2,NA,NA,NA),'z' = c(NA,NA,NA,4,5))` – julia Sep 16 '14 at 09:41
2

This should deal with the issue raised by @julia: `cbind.data.frame(data[1], mycol=c(na.omit(c(t(data[, -1])))))` – jbaums Nov 18 '14 at 12:32
I added a specific extension of this problem as a new post here http://stackoverflow.com/questions/40878660/in-r-combine-column-to-remove-nas-yet-prioritize-specific-replacements. – val Nov 30 '16 at 01:19

Jonas Lindeløv · Answer 3 · 2022-12-21T09:15:20.083

Here's a more general (but even simpler) solution which extends to all column types (factors, characters etc.) with non-ordered NA's. The strategy is simply to merge the non-NA values of other columns into your merged column using is.na for indexing:

data$mycol = data$x  # your new merged column. Start with x
data$mycol[!is.na(data$y)] = data$y[!is.na(data$y)]  # merge with y
data$mycol[!is.na(data$z)] = data$z[!is.na(data$z)]  # merge with z

> data
  a  x  y  z mycol
1 A  1 NA NA 1
2 B  2 NA NA 2
3 C NA  3 NA 3
4 D NA NA  4 4
5 E NA NA  5 5

Note that this will overwrite existing values in mycol if there are several non-NA values in the same row. If you have a lot of columns you could automate this by looping over colnames(data).

score 15 · Answer 4 · answered Jan 28 '13 at 14:02

15

I would use rowSums() with the na.rm = TRUE argument:

cbind.data.frame(a=data$a, mycol = rowSums(data[, -1], na.rm = TRUE))

which gives:

> cbind.data.frame(a=data$a, mycol = rowSums(data[, -1], na.rm = TRUE))
  a mycol
1 A     1
2 B     2
3 C     3
4 D     4
5 E     5

You have to call the method directly (cbind.data.frame) as the first argument above is not a data frame.

answered Jan 28 '13 at 14:02

Gavin Simpson

170,508
25
396
453

4

Good solution. But what about character data? – Artem Klevtsov Sep 04 '15 at 14:53
As the OP's example didn't contain character data (other than the identifier column) I didn't consider that; other options here will handle that but at a cost; for example, unlisting a data frame with many rows and columns is going to be slow. If you have non-numeric data use another option, but if you don't this is a pretty trivial and relatively quick solution. – Gavin Simpson Oct 25 '17 at 16:26

score 5 · Answer 5 · answered Jan 28 '13 at 13:57

5

Something like this ?

data.frame(a=data$a, mycol=apply(data[,-1],1,sum,na.rm=TRUE))

gives :

answered Jan 28 '13 at 13:57

juba

47,631
14
113
118

Yes, you're right, thanks for pointing this out ! I won't edit my post because it would make it a duplicate of Gavin Simpson's one. – juba Jan 28 '13 at 14:04
1

Thanks juba, this works providing the columns are always numeric as they are in my example, but in reality the columns are sometimes character so I have used the above answer from Sven. – user1165199 Jan 28 '13 at 14:11

score 1 · Answer 6 · answered Sep 20 '16 at 18:55

1

max works too. Also works on strings vectors.

cbind(data[1], mycol=apply(data[-1], 1, max, na.rm=T))

answered Sep 20 '16 at 18:55

drollix

111
1

score 1 · Answer 7 · answered May 03 '19 at 21:13

One possibility using dplyr and tidyr could be:

data %>%
 gather(variables, mycol, -1, na.rm = TRUE) %>%
 select(-variables)

   a mycol
1  A     1
2  B     2
8  C     3
14 D     4
15 E     5

Here it transforms the data from wide to long format, excluding the first column from this operation and removing the NAs.

score 0 · Answer 8 · edited May 23 '17 at 12:34

0

In a related link (suppress NAs in paste()) I present a version of paste with a na.rm option (with the unfortunate name of paste5).

With this the code becomes

cols <- c("x", "y", "z")
cbind.data.frame(a = data$a, mycol = paste2(data[, cols], na.rm = TRUE))

The output of paste5 is a character, which works if you have character data otherwise you'll need to coerce to the type you want.

edited May 23 '17 at 12:34

Community

1
1

answered Dec 16 '15 at 10:37

JWilliman

3,558
32
36

Ferroao · Answer 9 · 2017-08-25T19:47:10.980

Though this is not the OP case, it seems some people like the approach based on sums, how about thinking in mean and mode, to make the answer more universal. This answer matches the title, which is what many people will find.

data <- data.frame('a' = c('A','B','C','D','E'),
                   'x' = c(1,2,NA,NA,9),
                   'y' = c(NA,6,3,NA,5),
                   'z' = c(NA,NA,NA,4,5))

splitdf<-split(data[,c(2:4)], seq(nrow(data[,c(2:4)])))

data$mean<-unlist(lapply(splitdf, function(x)  mean(unlist(x), na.rm=T) ) )
data$mode<-unlist(lapply(splitdf, function(x)  {
  tab <- tabulate(match(x, na.omit(unique(unlist(x) )))); 
                  paste(na.omit(unique(unlist(x) ))[tab == max(tab) ], collapse = ", " )}) )

data
  a  x  y  z     mean mode
1 A  1 NA NA 1.000000    1
2 B  2  6 NA 4.000000 2, 6
3 C NA  3 NA 3.000000    3
4 D NA NA  4 4.000000    4
5 E  9  5  5 6.333333    5

Bharadwaj A K · Answer 10 · 2019-10-15T12:56:01.923

0

If you want to stick with base,

data <- data.frame('a' = c('A','B','C','D','E'),'x' = c(1,2,NA,NA,NA),'y' = c(NA,NA,3,NA,NA),'z' = c(NA,NA,NA,4,5))
data[is.na(data)]<-","
data$mycol<-paste0(data$x,data$y,data$z)
data$mycol <- gsub(',','',data$mycol)

edited Oct 15 '19 at 12:56

answered Oct 15 '19 at 12:48

Bharadwaj A K

59
4

Combine column to remove NA's

10 Answers10

Linked

Related