5

I would like to sum vectors that include NAs.

For example:

a <- c(5, 3, 1, NA, 2)
b <- c(NA, 1, 2, 1, 7)

The expected output would be:

[1] 5 4 3 1 9

sum doesn't work in this situation, as sum(a, b, na.rm = T) is equivalent to sum(c(a, b), na.rm = T).

+ does work (i.e. a + b) but does not remove the NAs.

You can use rowSums(cbind(a, b), na.rm = T), but in practice this can lead to messy code - for example if the vectors are columns of a data.table.

Is there an equivalent of pmax for sum, e.g. psum(a, b, na.rm = T)?

Dan Lewer
  • 871
  • 5
  • 12

3 Answers3

3

You can try the following using mapply to apply the sum function to the two vectors a and b. The na.rm=TRUE instructs to remove NA values from the calculation:

a <- c(5, 3, 1, NA, 2)
b <- c(NA, 1, 2, 1, 7)

mapply(sum, a, b, na.rm=TRUE)

Output:

[1] 5 4 3 1 9

Or, you can opt to use reduce as suggested by @Roland :

Reduce("+", lapply(list(a,b), function(x) replace(x, is.na(x), 0)))
Eric
  • 2,699
  • 5
  • 17
  • 1
    For longer vectors it would be very inefficient to loop over the vector elements. It would be much faster to write a version of `+` that first turns `NA` values to zero and then use `Reduce` to loop over the vectors. – Roland May 27 '20 at 12:48
  • Hi @Roland. Thank you for the information. I have added an option to use `Reduce` as well. Also, @DanLewer, if you're interested in learning more about `Reduce`, take a look at [R: Reduce() – apply’s lesser known brother](https://blog.zhaw.ch/datascience/r-reduce-applys-lesser-known-brother/) – Eric May 27 '20 at 13:25
1

mapply is what you want:

mapply(sum, a, b, na.rm = TRUE)

# [1] 5 4 3 1 9
Jrm_FRL
  • 1,394
  • 5
  • 15
1

you can make your own psum function

psum <- function(x,y){
x[is.na(x)] <- 0
y[is.na(y)] <- 0
x+y
}


> psum(a,b)
[1] 5 4 3 1 9
Daniel O
  • 4,258
  • 6
  • 20
  • 1
    I was going to suggest something similar but the function is wrong: its name suggests it does the equivalent to `pmax`, but it doesn’t. It has distinct, and non-generalisable semantics. To be called `psum` it should allow more than two parameters, and also expose an `na.rm` argument with appropriate semantics. – Konrad Rudolph May 27 '20 at 12:48
  • 1
    @KonradRudolph Easy to generalise with `Reduce`. – Roland May 27 '20 at 12:49
  • @Roland That’s true but that’s not even what I mean (see updated comment). – Konrad Rudolph May 27 '20 at 12:50
  • @KonradRudolph No problem. Just write a wrapper that uses an `if` condition to use this function or `+` with `Reduce`. – Roland May 27 '20 at 12:51
  • @Roland I know fixing it is not a problem. I’m pointing out that this answer doesn’t do this, and that this is a problem. – Konrad Rudolph May 27 '20 at 12:52
  • 2
    @KonradRudolph maybe the "p" in my function does not stand for "parallel", but rather "personal". As in, this is my personal function and I can name it whatever I'd like. – Daniel O May 27 '20 at 13:05
  • @DanielO In that case it’s still bad naming, and still problematic. – Konrad Rudolph May 27 '20 at 13:11
  • @KonradRudolph Those are both a matter of opinion. – Daniel O May 27 '20 at 13:49
  • @DanielO No, that’s absolutely not the case. You are falling into the (common) trap of thinking that because something (here, naming) is hard and affected by many factors it’s therefore inherently immune from objective critique. Not so. – Konrad Rudolph May 27 '20 at 13:58
  • @KonradRudolph You are falling into the (common) trap of thinking that your opinion is objective. – Daniel O May 27 '20 at 14:12
  • @DanielO No. I think you’re confusing us: I’ve given you objective *arguments* for why the name is bad in my initial comment. Your argument boils down to “I disagree”. – Konrad Rudolph May 27 '20 at 14:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214747/discussion-between-daniel-o-and-konrad-rudolph). – Daniel O May 27 '20 at 14:42