Sum of two Columns of Data Frame with NA Values

Question

I have a data frame with some NA values. I need the sum of two of the columns. If a value is NA, I need to treat it as zero.

a  b c d
1  2 3 4
5 NA 7 8

Column e should be the sum of b and c:

e
5
7

I have tried a lot of things, and done two dozen searches with no luck. It seems like a simple problem. Any help would be appreciated!

If NA are to be treated as 0, why not transform them to 0 and then do the operation? — erasmortg, Jul 16 '15 at 18:05

Rorschach · Accepted Answer · 2017-08-18T13:20:02.960

49

dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
#   a  b c d e
# 1 1  2 3 4 5
# 2 5 NA 7 8 7

edited Aug 18 '17 at 13:20

answered Jul 16 '15 at 18:02

Rorschach

31,301
5
78
129

Thanks! I knew there had to be a simple solution. I kept trying to apply the sum function by row, with no luck. – StatDance Jul 16 '15 at 18:06
You can do this easily with `apply` too, though `rowSums` is vectorized. BTW, the best performance will be achieved by explicitly converting to matrix, such as `rowSums(as.matrix...`. – David Arenburg Jul 16 '15 at 18:10
Hi, I'm trying to expand this answer to a list of data frames. I tryed so far lapply(dat, function(x) x$e <- rowSums(x[,c("b", "c", "d")], na.rm=T)) and dat$e <- lapply(dat, function(x) rowSums(x[,c("b", "c", "d")], na.rm=T)) without success. Do you have an idea? Thanks – N.Varela Sep 09 '16 at 21:00

score 32 · Answer 2 · answered Sep 17 '18 at 15:13

32

dplyr solution, taken from here:

library(dplyr)
dat %>% 
    rowwise() %>% 
    mutate(e = sum(b, c, na.rm = TRUE))

answered Sep 17 '18 at 15:13

David Rubinger

3,580
1
20
29

erasmortg · Answer 3 · 2015-07-16T18:55:28.657

3

Here is another solution, with concatenated ifelse():

 dat$e <- ifelse(is.na(dat$b) & is.na(dat$c), dat$e <-0, ifelse(is.na(dat$b), dat$e <- 0 + dat$c, dat$b + dat$c))
 #  a  b c d e
 #1 1  2 3 4 5
 #2 5 NA 7 8 7

Edit, here is another solution that uses with as suggested by @kasterma in the comments, this is much more readable and straightforward:

 dat$e <- with(dat, ifelse(is.na(b) & is.na(c ), 0, ifelse(is.na(b), 0 + c, b + c)))

edited Jul 16 '15 at 18:55

answered Jul 16 '15 at 18:09

erasmortg

3,246
1
17
34

1

Perfect place where `with` will improve things: `dat$e <- with(dat, ifelse(is.na(b) & is.na(c ), 0, ifelse(is.na(b), 0 + c, b + c)))`. – kasterma Jul 16 '15 at 18:47
I agree, much more readable! I edited the answer with your solution as well, if that's ok – erasmortg Jul 16 '15 at 18:53
totally fine @erasmortg – kasterma Jul 16 '15 at 18:55
4

This neither efficient or robust. If you'd have ten columns you will have to nest 10 `ifelse` statements and who knows how many other statements too. – David Arenburg Jul 16 '15 at 19:03

score 3 · Answer 4 · answered Apr 24 '19 at 14:47

if you want to keep NA if both columns has it you can use:

Data, sample:

dt <- data.table(x = sample(c(NA, 1, 2, 3), 100, replace = T), y = sample(c(NA, 1, 2, 3), 100, replace = T))

Solution:

dt[, z := ifelse(is.na(x) & is.na(y), NA_real_, rowSums(.SD, na.rm = T)), .SDcols = c("x", "y")]

(the data.table way)

score 3 · Answer 5 · answered Jul 30 '20 at 19:25

I hope that it may help you

Some cases you have a few columns that are not numeric. This approach will serve you both. Note that: c_across() for dplyr version 1.0.0 and later

df <- data.frame(
  TEXT = c("text1", "text2"), a = c(1,5), b = c(2, NA), c = c(3,7), d = c(4,8))

df2 <- df %>% 
  rowwise() %>% 
  mutate(e = sum(c_across(a:d), na.rm = TRUE))
# A tibble: 2 x 6
# Rowwise: 
# TEXT        a     b     c     d     e
# <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 text1     1     2     3     4    10
# 2 text2     5    NA     7     8    20

LMc · Answer 6 · 2023-08-23T17:26:56.733

0

dplyr

rowwise is really inefficient for even moderately sized data frames. If there is a row-wise variant that will be much faster. For summation this would be rowSums. You can use pick wrapped in rowSums to tidy-select columns you want to sum across:

df |>
  mutate(e = rowSums(pick(c:d), na.rm = T))
#   a  b c d  e
# 1 1  2 3 4  7
# 2 5 NA 7 8 15

edited Aug 23 '23 at 17:26

answered Aug 23 '23 at 17:21

LMc

12,577
3
31
43

Sum of two Columns of Data Frame with NA Values

6 Answers6

dplyr

Linked

Related