RowSums NA + NA gives 0

Question

I'll just understand a (for me) weird behavior of the function rowSums. Imagine I have this super simple dataframe:

a = c(NA, NA,3)
b = c(2,NA,2)
df = data.frame(a,b)
df
   a  b
1 NA  2
2 NA NA
3  3  2

and now I want a third column that is the sum of the other two. I cannot use simply + because of the NA:

df$c <- df$a + df$b
df
   a  b  c
1 NA  2 NA
2 NA NA NA
3  3  2  5

but if I use rowSums the rows that have NA are calculated as 0, while if there is only one NA everything works fine:

df$d <- rowSums(df, na.rm=T)
df
   a  b  c  d
1 NA  2 NA  2
2 NA NA NA  0
3  3  2  5 10

am I missing something?

Thanks to all

I was wondering why no one had answered `base::psum` – rawr Jul 23 '16 at 18:19 — rawr, Jul 23 '16 at 18:19

Zheyuan Li · Answer 1 · 2016-07-23T18:10:46.830

Because

sum(numeric(0))
# 0

Once you used na.rm = TRUE in rowSums, the second row is numeric(0). After taking sum, it is 0.

If you want to retain NA for all NA cases, it would be a two-stage work. I recommend writing a small function for this purpose:

my_rowSums <- function(x) {
  if (is.data.frame(x)) x <- as.matrix(x)
  z <- base::rowSums(x, na.rm = TRUE)
  z[!base::rowSums(!is.na(x))] <- NA
  z
  }

my_rowSums(df)
# [1]  2 NA 10

This can be particularly useful, if the input x is a data frame (as in your case). base::rowSums would first check whether input is matrix or not. If it gets a data frame, it would convert it into a matrix first. Type conversion is in fact more costly than actual row sum computation. Note that we call base::rowSums two times. To reduce type conversion overhead, we should make sure x is a matrix beforehand.

For @akrun's "hacking" answer, I suggest:

akrun_rowSums <- function (x) {
  if (is.data.frame(x)) x <- as.matrix(x)
  rowSums(x, na.rm=TRUE) *NA^!rowSums(!is.na(x))
  }

akrun_rowSums(df)
# [1]  2 NA 10

mm ok.. But what if I want to keep NA also in the third column? — matteo, Jul 23 '16 at 17:10
This will probably be a 2 step process. For example, `df$new <- rowSums(df, na.rm=T); is.na(df$new) <- rowSums(is.na(df)) == length(df)` — lmo, Jul 23 '16 at 17:21

score 6 · Accepted Answer · answered Jul 23 '16 at 17:22

6

One option with rowSums would be to get the rowSums with na.rm=TRUE and multiply with the negated (!) rowSums of negated (!) logical matrix based on the NA values after converting the rows that have all NAs into NA (NA^)

rowSums(df, na.rm=TRUE) *NA^!rowSums(!is.na(df))
#[1]  2 NA 10

answered Jul 23 '16 at 17:22

akrun

874,273
37
540
662

5

This is a fun hack: `NA^0 == 1`. – lmo Jul 23 '16 at 17:29
2

you should add this as an answer to the linked question – rawr Jul 23 '16 at 18:21

RowSums NA + NA gives 0

2 Answers2