159

Suppose I have a n by 2 matrix and a function that takes a 2-vector as one of its arguments. I would like to apply the function to each row of the matrix and get a n-vector. How to do this in R?

For example, I would like to compute the density of a 2D standard Normal distribution on three points:

bivariate.density(x = c(0, 0), mu = c(0, 0), sigma = c(1, 1), rho = 0){
    exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+x[2]^2/sigma[2]^2-2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2))
}

out <- rbind(c(1, 2), c(3, 4), c(5, 6))

How to apply the function to each row of out?

How to pass values for the other arguments besides the points to the function in the way you specify?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Tim
  • 1
  • 141
  • 372
  • 590

7 Answers7

210

You simply use the apply() function:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1]  4 10 16
R> 

This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply().

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thanks! What if the rows of the matrix is not the first arg of the function? How to specify which arg of the function each row of the matrix is assigned to? – Tim Nov 21 '10 at 04:10
  • Read the help for `apply()` -- it sweeps by row (when the second arg is 1, else by column), and the current row (or col) is always the first argument. That is how things are defined. – Dirk Eddelbuettel Nov 21 '10 at 04:15
  • 1
    @Tim : if you use an internal R function and the row is not the first arg, do as Dirk did and make your own custom function where row **is** the first arg. – Joris Meys Nov 22 '10 at 12:58
  • 3
    The plyr package provides a wide range of these apply kinds of functions. It also provides more functionality, including parallel processing. – Paul Hiemstra Nov 30 '11 at 13:29
  • Can you explain what `1` means in `apply(M, 1...)`? – cryptic0 Nov 08 '17 at 18:24
  • 7
    @cryptic0 this answer is late, but for googlers, the second argument in apply is the `MARGIN` argument. Here it means apply the function to the rows (the first dimension in `dim(M)`). If it were 2, it would apply the function to the columns. – De Novo Mar 05 '18 at 08:24
  • 1
    This answer is unfortunately not very helpful to the people wanting to apply complex functions to a data.frame with multiple column types. This is because apply function will convert the data.frame to matrix, which can have only one type of input across all the fields. – Carlito Sep 07 '22 at 12:55
  • @Carlito Please note that a) the question is twelve years old (!!) and b) its OP starts with a matrix. You are of course, even enouraged, to add your answer if you feel a need to generalize. – Dirk Eddelbuettel Sep 07 '22 at 13:34
17

Here is a short example of applying a function to each row of a matrix. (Here, the function applied normalizes every row to 1.)

Note: The result from the apply() had to be transposed using t() to get the same layout as the input matrix A.

A <- matrix(c(
  0, 1, 1, 2,
  0, 0, 1, 3,
  0, 0, 1, 3
), nrow = 3, byrow = TRUE)

t(apply(A, 1, function(x) x / sum(x) ))

Result:

     [,1] [,2] [,3] [,4]
[1,]    0 0.25 0.25 0.50
[2,]    0 0.00 0.25 0.75
[3,]    0 0.00 0.25 0.75
Viliam Simko
  • 1,711
  • 17
  • 31
17

In case you want to apply common functions such as sum or mean, you should use rowSums or rowMeans since they're faster than apply(data, 1, sum) approach. Otherwise, stick with apply(data, 1, fun). You can pass additional arguments after FUN argument (as Dirk already suggested):

set.seed(1)
m <- matrix(round(runif(20, 1, 5)), ncol=4)
diag(m) <- NA
m
     [,1] [,2] [,3] [,4]
[1,]   NA    5    2    3
[2,]    2   NA    2    4
[3,]    3    4   NA    5
[4,]    5    4    3   NA
[5,]    2    1    4    4

Then you can do something like this:

apply(m, 1, quantile, probs=c(.25,.5, .75), na.rm=TRUE)
    [,1] [,2] [,3] [,4] [,5]
25%  2.5    2  3.5  3.5 1.75
50%  3.0    2  4.0  4.0 3.00
75%  4.0    3  4.5  4.5 4.00
aL3xa
  • 35,415
  • 18
  • 79
  • 112
9

Apply does the job well, but is quite slow. Using sapply and vapply could be useful. dplyr's rowwise could also be useful Let's see an example of how to do row wise product of any data frame.

a = data.frame(t(iris[1:10,1:3]))
vapply(a, prod, 0)
sapply(a, prod)

Note that assigning to variable before using vapply/sapply/ apply is good practice as it reduces time a lot. Let's see microbenchmark results

a = data.frame(t(iris[1:10,1:3]))
b = iris[1:10,1:3]
microbenchmark::microbenchmark(
    apply(b, 1 , prod),
    vapply(a, prod, 0),
    sapply(a, prod) , 
    apply(iris[1:10,1:3], 1 , prod),
    vapply(data.frame(t(iris[1:10,1:3])), prod, 0),
    sapply(data.frame(t(iris[1:10,1:3])), prod) ,
    b %>%  rowwise() %>%
        summarise(p = prod(Sepal.Length,Sepal.Width,Petal.Length))
)

Have a careful look at how t() is being used

Pratham
  • 159
  • 1
  • 8
  • It might be more fair to compare the apply family if you used `b <- t(iris[1:10, 1:3])` and `apply(b, 2 prod)`. – DaSpeeg Dec 13 '18 at 16:59
5

First step would be making the function object, then applying it. If you want a matrix object that has the same number of rows, you can predefine it and use the object[] form as illustrated (otherwise the returned value will be simplified to a vector):

bvnormdens <- function(x=c(0,0),mu=c(0,0), sigma=c(1,1), rho=0){
     exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+
                           x[2]^2/sigma[2]^2-
                           2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 
     1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2))
     }
 out=rbind(c(1,2),c(3,4),c(5,6));

 bvout<-matrix(NA, ncol=1, nrow=3)
 bvout[] <-apply(out, 1, bvnormdens)
 bvout
             [,1]
[1,] 1.306423e-02
[2,] 5.931153e-07
[3,] 9.033134e-15

If you wanted to use other than your default parameters then the call should include named arguments after the function:

bvout[] <-apply(out, 1, FUN=bvnormdens, mu=c(-1,1), rho=0.6)

apply() can also be used on higher dimensional arrays and the MARGIN argument can be a vector as well as a single integer.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

Another approach if you want to use a varying portion of the dataset instead of a single value is to use rollapply(data, width, FUN, ...). Using a vector of widths allows you to apply a function on a varying window of the dataset. I've used this to build an adaptive filtering routine, though it isn't very efficient.

joran
  • 169,992
  • 32
  • 429
  • 468
DWAHL
  • 156
  • 6
0

A dplyr Approach using across, rowSums and rowMeans.

M <- matrix(1:9, nrow=3, byrow=TRUE)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

M %>% as_tibble() %>%
  rowwise() %>% 
  mutate(sum = rowSums(across(where(is.numeric)))) %>% 
  mutate(mean = rowMeans(across(V1:V3))) %>%
  mutate(Max = max(V1:V3)) %>%
  mutate(Min = min(V1:V3)) %>%
  as.matrix()

     V1 V2 V3 sum mean Max Min
[1,]  1  2  3   6    2   3   1
[2,]  4  5  6  15    5   6   4
[3,]  7  8  9  24    8   9   7
rubengavidia0x
  • 501
  • 1
  • 5
  • 18