36

I have a data frame with some NA values. I need the sum of two of the columns. If a value is NA, I need to treat it as zero.

a  b c d
1  2 3 4
5 NA 7 8

Column e should be the sum of b and c:

e
5
7

I have tried a lot of things, and done two dozen searches with no luck. It seems like a simple problem. Any help would be appreciated!

StatDance
  • 377
  • 1
  • 3
  • 5

6 Answers6

49
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
#   a  b c d e
# 1 1  2 3 4 5
# 2 5 NA 7 8 7
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Thanks! I knew there had to be a simple solution. I kept trying to apply the sum function by row, with no luck. – StatDance Jul 16 '15 at 18:06
  • You can do this easily with `apply` too, though `rowSums` is vectorized. BTW, the best performance will be achieved by explicitly converting to matrix, such as `rowSums(as.matrix...`. – David Arenburg Jul 16 '15 at 18:10
  • Hi, I'm trying to expand this answer to a list of data frames. I tryed so far lapply(dat, function(x) x$e <- rowSums(x[,c("b", "c", "d")], na.rm=T)) and dat$e <- lapply(dat, function(x) rowSums(x[,c("b", "c", "d")], na.rm=T)) without success. Do you have an idea? Thanks – N.Varela Sep 09 '16 at 21:00
32

dplyr solution, taken from here:

library(dplyr)
dat %>% 
    rowwise() %>% 
    mutate(e = sum(b, c, na.rm = TRUE))
David Rubinger
  • 3,580
  • 1
  • 20
  • 29
3

Here is another solution, with concatenated ifelse():

 dat$e <- ifelse(is.na(dat$b) & is.na(dat$c), dat$e <-0, ifelse(is.na(dat$b), dat$e <- 0 + dat$c, dat$b + dat$c))
 #  a  b c d e
 #1 1  2 3 4 5
 #2 5 NA 7 8 7

Edit, here is another solution that uses with as suggested by @kasterma in the comments, this is much more readable and straightforward:

 dat$e <- with(dat, ifelse(is.na(b) & is.na(c ), 0, ifelse(is.na(b), 0 + c, b + c)))
erasmortg
  • 3,246
  • 1
  • 17
  • 34
3

if you want to keep NA if both columns has it you can use:

Data, sample:

dt <- data.table(x = sample(c(NA, 1, 2, 3), 100, replace = T), y = sample(c(NA, 1, 2, 3), 100, replace = T))

Solution:

dt[, z := ifelse(is.na(x) & is.na(y), NA_real_, rowSums(.SD, na.rm = T)), .SDcols = c("x", "y")]

(the data.table way)

K. Peltzer
  • 326
  • 3
  • 7
3

I hope that it may help you

Some cases you have a few columns that are not numeric. This approach will serve you both. Note that: c_across() for dplyr version 1.0.0 and later

df <- data.frame(
  TEXT = c("text1", "text2"), a = c(1,5), b = c(2, NA), c = c(3,7), d = c(4,8))

df2 <- df %>% 
  rowwise() %>% 
  mutate(e = sum(c_across(a:d), na.rm = TRUE))
# A tibble: 2 x 6
# Rowwise: 
# TEXT        a     b     c     d     e
# <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 text1     1     2     3     4    10
# 2 text2     5    NA     7     8    20
Tho Vu
  • 1,304
  • 2
  • 8
  • 20
0

dplyr

rowwise is really inefficient for even moderately sized data frames. If there is a row-wise variant that will be much faster. For summation this would be rowSums. You can use pick wrapped in rowSums to tidy-select columns you want to sum across:

df |>
  mutate(e = rowSums(pick(c:d), na.rm = T))
#   a  b c d  e
# 1 1  2 3 4  7
# 2 5 NA 7 8 15
LMc
  • 12,577
  • 3
  • 31
  • 43