16

I want to calculate an equation in R. I don't want to use the function sum because it's returning 1 value. I want the full vector of values.

x = 1:10
y = c(21:29,NA)
x+y
 [1] 22 24 26 28 30 32 34 36 38 NA

x = 1:10
y = c(21:30)
x+y
 [1] 22 24 26 28 30 32 34 36 38 40

I don't want:

sum(x,y, na.rm = TRUE)
[1] 280

Which does not return a vector.

This is a toy example but I have a more complex equation using multiple vector of length 84647 elements.

Here is another example of what I mean:

x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)
5 +2*(x+y-50)/(x+y+z+a) 
 [1]       NA       NA       NA 4.388889 4.473684 4.550000 4.619048 4.681818 4.739130       NA
M--
  • 25,431
  • 8
  • 61
  • 93
M. Beausoleil
  • 3,141
  • 6
  • 29
  • 61
  • 3
    get the row wise sum of the concatenated vectors. `rowSums(cbind(x,y), na.rm = T)` – M-- Jul 25 '17 at 18:58
  • Ok, so you would put it in a data frame format. It's not possible to use them directly as vectors? `apply(cbind(x,y), 1, function(x) sum(x, na.rm = T))` – M. Beausoleil Jul 25 '17 at 19:00
  • 2
    `cbind` creates matrixes, not data frames. `rowSums` is optimized and will be very quick, faster than `apply(..., 1, sum, na.rm = T)`. – Gregor Thomas Jul 25 '17 at 19:08
  • *Skipping* NAs in a complex expression does not require to use a custom defined `+`. You can simply skip NAs in the final resulting vector e.g. `res<-res[!is.na(res)]` and that also avoid the risk to end up summing vectors of different length (in case of different number of NAs). If you want to *replace* NAs with zero instead, that's another story... – digEmAll Jul 26 '17 at 07:01

5 Answers5

18

1) %+% Define a custom + operator:

`%+%` <- function(x, y)  mapply(sum, x, y, MoreArgs = list(na.rm = TRUE))
5 + 2 * (x %+% y - 50) / (x %+% y %+% z %+% a)

giving:

[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879

Here are some simple examples:

1 %+% 2
## [1] 3

NA %+% 2
## [1] 2

2 %+% NA
## [1] 2

NA %+% NA
## [1] 0

2) na2zero Another possibility is to define a function which maps NA to 0 like this:

na2zero <- function(x) ifelse(is.na(x), 0, x)

X <- na2zero(x)
Y <- na2zero(y)
Z <- na2zero(z)
A <- na2zero(a)

5 + 2 * (X + Y - 50) / (X + Y + Z + A)

giving:

[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879

3) combine above A variation combining (1) with the idea in (2) is:

X <- x %+% 0
Y <- y %+% 0
Z <- z %+% 0
A <- a %+% 0

5 + 2 * (X + Y - 50) / (X + Y + Z + A)

4) numeric0 class We can define a custom class "numeric0" with its own + operator:

as.numeric0 <- function(x) structure(x, class = "numeric0")
`+.numeric0` <- `%+%`

X <- as.numeric0(x)
Y <- as.numeric0(y)
Z <- as.numeric0(z)
A <- as.numeric0(a)

5 + 2 * (X + Y - 50) / (X + Y + Z + A)

Note: The inputs used were those in the question, namely:

x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I was trying to find a way to use `mapply` and get four separate vectors (`X, Y, Z, A`). Out of curiosity, is it possible? (p.s. sorry for the edit, I saw a typo) – M-- Jul 25 '17 at 19:34
  • 1
    You can do this `attach(lapply(list(X = x, Y = y, Z = z, A = a), na2zero))` or `with(lapply(...), ...expression in X, Y, Z, A...)` – G. Grothendieck Jul 25 '17 at 19:38
16

Using rowSums:

To elaborate on my comment, you can concatenate the vectors and then apply your calculations on the resulted matrix. This is the solution for the example that you provided at the end of your question;

5 + 2 * (rowSums(cbind(x,y), na.rm = T)-50)/(rowSums(cbind(x,y,z,a), na.rm = T))

#  [1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818 
#  [9] 4.739130 3.787879

Repalcing NA:

I have seen solutions here with the idea of replacing NA in the vectors; I think this would be helpful too:

y[is.na(y)] <- 0 #indexing NA values and replacing with zero
M--
  • 25,431
  • 8
  • 61
  • 93
7

you can use ifelse()

x = 1:10
y = c(21:29,NA)
x+y

[1] 22 24 26 28 30 32 34 36 38 NA

x + ifelse(is.na(y), 0, y)

[1] 22 24 26 28 30 32 34 36 38 10
Mouad_Seridi
  • 2,666
  • 15
  • 27
7

DATA

x = 1:10
y = c(21:29,NA)
x+y
# [1] 22 24 26 28 30 32 34 36 38 NA

1

foo1 = function(...){
    return(rowSums(cbind(...), na.rm = TRUE))
}
foo1(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10

2

foo2 = function(...){
    Reduce('+', lapply(list(...), function(x) replace(x, is.na(x), 0)))
}
foo2(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10
d.b
  • 32,245
  • 6
  • 36
  • 77
  • 4
    No need to go to `data.frame`, use `cbind()` instead to keep it as a matrix and avoid the extra conversions (`rowSums` will just convert back to a matrix). – Gregor Thomas Jul 25 '17 at 19:10
5

Just for laffs:

x=1:10
y=c(21:29, NA)

"[<-"(x, is.na(x), 0) + "[<-"(y, is.na(y), 0)
# [1] 22 24 26 28 30 32 34 36 38 10

which again illustrates the fact that everything in R is a function (and also shows that the R interpreter is smart enough to turn a string into a function when required).

Syntactically sweetened:

na.zero <- function(x)
{
    "[<-"(x, is.na(x), 0)
}
na.zero(x) + na.zero(y)
# [1] 22 24 26 28 30 32 34 36 38 10

More broadly applicable version:

na.replace <- function(x, value)
{
    "[<-"(x, is.na(x), value)
}
na.replace(x, 1) * na.replace(x, 1)
# [1]   1   4   9  16  25  36  49  64  81 100
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187