Sum NAs across columns using dplyr

Question

I have 4 columns in a dataframe of 244 columns. I need to do a sum over these columns, which can be done with a simple sum function. However, the sum is not taking into consideration the nas. So when I run:

df <- d%>% 
rowwise() %>% 
mutate(DV = sum(x1, x2, x3, x4, na.rm=TRUE))

I am getting 0, when all the values are NA, I would like to get NA when all the values in the x1[2], x2[2], x3[2] and x4[2] are zero. I have been scratching my head for 3 hours on this, and no joy. I did create (gleaned) a function from the internet, but it is still not working:

sum0 <- function(x, ...){if(sum(is.na(x))==4) return(NA_real_) else(sum(x, ..., na.rm=TRUE))} 

df <- d%>% 
rowwise() %>% 
mutate(DV = sum0(x1, x2, x3, x4, na.rm=TRUE))

It is not working because the is.na value is not counting the values correctly. I am really stuck here, any help would be very appreciated.

Hypothetical data

 # Create a, b, c, d variables
  a <- c('a1', 'a2', 'a3', 'a4')
  b <- c(10, NA, 30, 40)
  c <- c(2.5, NA, 8, 1)
  d <- c(2.5, NA, 10, 7)
  e <- c(2.5, NA, 10, 7)
  # Join the variables to create a data frame

  df <- data.frame(a, b, c, d, e)
  dfx <- df %>% rowwise() %>% mutate(DV = sum0(c(b,c,d,e)), na.rm = TRUE)

Here I would want the value in DV[2] to be NA, the rest summed normally.

Please provide a sample of your data with `dput` and the expected output. — AlexB, Feb 22 '20 at 09:59
I provided a working example, note there are 244 columns so not so easy to use the select(.,-a) command — Lowpar, Feb 22 '20 at 10:11
What if you replace those zeros by checking all the column values of the original dataset were `NA`s by applying `all(is.na(x))` per column? A rule based on something like `apply(df, 2, function(x) all(is.na(x)))` — Bruno Zamengo, Feb 22 '20 at 10:25
Unfortunately 0 is an acceptable response, so cannot do this. — Lowpar, Feb 22 '20 at 10:33
Related https://stackoverflow.com/questions/33806575/ignore-na-in-dplyr-row-sum & https://dplyr.tidyverse.org/reference/rowwise.html — Tung, Jul 15 '20 at 01:14

tmfmnk · Accepted Answer · 2020-02-22T11:27:02.430

4

You can do:

df %>%
 mutate(DV = rowSums(select(., b:e)))

   a  b   c    d    e   DV
1 a1 10 2.5  2.5  2.5 17.5
2 a2 NA  NA   NA   NA   NA
3 a3 30 8.0 10.0 10.0 58.0
4 a4 40 1.0  7.0  7.0 55.0

If there could be rows with just a few NAs:

df %>%
 mutate(DV = rowSums(select(., b:e), na.rm = TRUE) * NA ^ (rowSums(!is.na(select(., b:e))) == 0))

Alternatively, you can also do:

df %>%
 filter_at(vars(b:e), any_vars(!is.na(.))) %>%
 mutate(DV = rowSums(select(., b:e), na.rm = TRUE)) %>%
 bind_rows(df %>%
            filter_at(vars(b:e), all_vars(is.na(.))))

edited Feb 22 '20 at 11:27

answered Feb 22 '20 at 11:09

tmfmnk

38,881
4
47
67

Nice! Never knew you could just use select in the rowSums function. I always just used `rowSums(.[2:5])` but I like this better. – Annet Feb 22 '20 at 11:18
@Annet you can even use all the select helpers as `starts_with()`, `matches()` etc. :) – tmfmnk Feb 22 '20 at 11:21

Sum NAs across columns using dplyr

1 Answers1