Avoiding a loop when populating data frames in R

Question

I have an empty data frame T_modelled with 2784 columns and 150 rows.

T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

where

t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)

I filled T_modelled by column with a nested for loop, based on a formula:

for (i in 1:ncol(T_modelled)) {
  col_tmp <- colnames(T_modelled)[i]
  for (j in 1:nrow(T_modelled)) {
    z_tmp <- z[j]-0.1
    T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
    T_modelled[j ,col_tmp] <- T_tmp
  }
}

where

MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)

I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.

Anyone got any ideas how to get the same result in a faster, more "R-like" manner?

Didn't you just ask a [similar question](https://stackoverflow.com/q/47728845/1422451) with an accepted solution? Why return back to nested `for` loops? — Parfait, Dec 11 '17 at 19:38
Since I started learning R about two months ago, I have been using loops quite a lot. Just like with the similar question, this loop has been around for a while, rather than writing it after seeing the solution to my last question. Just recently, I came across more and more blogs and comments that suggest to avoid loops when performing tasks in R. This is why I am trying to make a shift but `sapply` for example still doesn't come as "naturally" to me as loops do. It starts to grow on me, however, and I believe I get the solution with a function call to sapply now. — Arne Brandschwede, Dec 12 '17 at 10:15

score 2 · Answer 1 · answered Dec 11 '17 at 19:01

2

I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.

Tm <- T_modelled

Now run your code then run the code below.

z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
    T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
    Tm[ , i] <- T_tmp
}

all.equal(T_modelled, Tm)
#[1] TRUE

You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)

answered Dec 11 '17 at 19:01

Rui Barradas

70,273
8
34
66

Yep, that makes the loop more consistent. I would still prefer the options with sapply that take advantage of R's vectorized way of performing tasks. – Arne Brandschwede Dec 12 '17 at 10:01
Do note: the *apply* family of functions including `sapply` are loops just hidden underneath, not vectorized. Read this [excellent post](https://stackoverflow.com/questions/28983292/is-the-apply-family-really-not-vectorized). Often a misconception but there is nothing wrong with `for` loops in R. Your issue was nested loops in updating every row and column (i.e., cell) at a time so 150x2784 iterations! Here and with `sapply` a whole column is processed in each iteration, so just 150. – Parfait Dec 12 '17 at 14:18

score 2 · Answer 2 · answered Dec 11 '17 at 20:41

2

Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.

z_adj <- z - 0.1

T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
        MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))

colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)

all.equal(T_modelled, T_modelled2)
# [1] TRUE

answered Dec 11 '17 at 20:41

Parfait

104,375
17
94
125

Yes, that works just fine. I think it is even easier to read when you store the function beforehand so you just need to call it to `sapply`. But that might just be a matter of taste. Defining `z_adj` prior to the function makes sense as well. – Arne Brandschwede Dec 12 '17 at 10:00
Indeed you can do that. Here we use an anonymous function in `sapply`. – Parfait Dec 12 '17 at 14:11

score 1 · Accepted Answer · answered Dec 11 '17 at 19:46

Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.

You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this

x <- c(1, 6, 3)
sum(x)

not needing something like this

x_ <- 0
for (i in x) {
    x_ <- i + x_ 
}
x_

That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.

When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.

If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).

z_tmp <- z - 0.1
i <- 1

T_tmp <- MANSRT + As * 
         exp(-z_tmp*(omega/(2*K))^0.5) * 
         sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)

Looks OK. Then you start looping over the elements of t_sec_ERT.

T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))

for (i in 1:length(t_sec_ERT)) {
    T_tmp[, i] <- MANSRT + As * 
             exp(-z_tmp*(omega/(2*K))^0.5) * 
             sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}

Or you can do it with sapply() which is often neater.

f <- function(x) {
    MANSRT + As * 
    exp(-z_tmp*(omega/(2*K))^0.5) * 
    sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}

T_tmp <- sapply(t_sec_ERT, f)

The solution with storing the function in `f` and then calling it to `sapply` is nice! If you set `data.frame(sapply(t_sec_ERT, f))`, you have it initially stored as a data frame. — Arne Brandschwede, Dec 12 '17 at 10:08

score 0 · Answer 4 · answered Dec 11 '17 at 19:19

I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:

t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)

v <- expand.grid(t_sec_ERT, z) 
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)

T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))

Avoiding a loop when populating data frames in R

4 Answers4