Dynamically update input dataframe at each iteration of function without global assignment

Question

I have (1) a reference table of ratings, and (2) a function which randomly generates results based on these ratings and updates the ratings based upon the generated result.

Although there are easier solutions to the reproducible example below, the intended application is to simulate results between opponents based upon their Elo ratings, with ratings being updated after each round in order to run the simulations 'hot'.

Here, I have a reference table of ratings ref and use the function genResult to generate a random result and update the reference table using global assignment.

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  # assign('ref', ref, envir=.GlobalEnv)
  ref <<- ref

  return(list(result_i, ref))
}

Replicating this function twice, we can see ref is updated as expected.

replicate(2, genResult(ref), simplify = F)

Returning this, where we can see reference table is updated in each of the two iterations.

[[1]]
[[1]][[1]]
id score
1  A     1

[[1]][[2]]
id rating
1  A    130
2  B    179
3  C    141
4  D    188
5  E    194


[[2]]
[[2]][[1]]
id score
1  C    -2

[[2]][[2]]
id rating
1  A    130
2  B    179
3  C    139
4  D    188
5  E    194

Now let's say I want to replicate the above (replicated) function; simulating 3 separate instances of 5 results with dynamically updated ratings and outputting only the results. I make the reference table ref again and define a similar function which uses global assignment:

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  ref <<- ref

  return(result_i)
}

Then use an apply loop and collapse the list of results to a dataframe:

lapply(1:3, function(i) {

  ref_i <- ref

  replicate(5, genResult2(ref_i), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

Returning:

id score i
1   A     1 1
2   C    -2 1
3   B     9 1
4   A    26 1
5   A    -9 1
6   D    10 2
7   D     8 2
8   C     5 2
9   A    36 2
10  C    17 2
11  B    14 3
12  B   -15 3
13  B    -4 3
14  A   -22 3
15  B   -13 3

Now this seems to work as expected, but (i) it feels really ugly, and (ii) I've read countless times that global assignment can and will cause unexpected injury.

Can anyone suggest a better solution?

moodymudskipper · Accepted Answer · 2018-07-03T10:08:20.053

6

If you're iterating and that the next iteration is dependent on the last it's often a good sign that you should use old fashioned for loop and not replicate or apply functions (Another possibility would have been to use Reduce with accumulate parameter set to TRUE).

This gives the same ouput as the code you posted, I used a for loop and made your function return ref as well:

genResult3 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  #ref <<- ref

  return(list(result_i,ref)) # added ref to output
}

lapply(1:3, function(i) {
  res <- list(5)
  for (k in 1:5){
    gr <- genResult3(ref)
    res[[k]] <- gr[[1]] # update rating
    ref      <- gr[[2]] # get result
    res[[k]] <- left_join(res[[k]], ref, by = "id") # combine for output
  }
    plyr::rbind.fill(res) %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

Returning:

   id score rating i
1   A     1    130 1
2   C    -2    139 1
3   B     9    188 1
4   A    26    156 1
5   A    -9    147 1
6   D    10    198 2
7   D     8    206 2
8   C     5    146 2
9   A    36    165 2
10  C    17    163 2
11  B    14    193 3
12  B   -15    178 3
13  B    -4    174 3
14  A   -22    107 3
15  B   -13    161 3

edited Jul 03 '18 at 10:08

answered Jul 02 '18 at 01:34

moodymudskipper

46,417
11
121
167

That's great thanks -- I think I must have developed an aversion to loops in R a few years ago, thinking `*apply` was superior in every case! Yeah it wasn't the best example in the end but I've added `ref` to the output in your answer to see `rating` being updated and reset at each iteration. – jogall Jul 03 '18 at 09:42
2

loopophobia is common but it's curable :). For loops are often less readable (to someone who went through the *apply drill), but when they're more readable or avoid bad practices like `<<-` you should use them, they're about as fast if done right. See this post: https://stackoverflow.com/questions/48793273/why-not-use-a-for-loop/48793370#48793370 – moodymudskipper Jul 03 '18 at 10:22
1

Great answer! I think understanding that vectorised functions in R are often faster because they're optimised in C explains the misconception that `*apply` is always better than for loops. – jogall Jul 04 '18 at 11:34
1

yes, and some often maintain that for loops can be faster, though in my answer linked above I'm still waiting for someone to propose an example of such situation. In any case it's the same order of magnitude. – moodymudskipper Jul 04 '18 at 11:39

SeGa · Answer 2 · 2018-06-26T13:54:54.847

You can create a new environment with new.env() and do the calculations there:

Applying that idea to your first function gives this:

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(list(result_i, ref))
}
replicate(2, genResult(refEnv$ref), simplify = F)

ref1
refEnv$ref

You will see that the original ref1 is not touched and remains the same, while refEnv$ref contains the result from the last iteration.

And implementing that to your second function with lapply:

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                   rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1


genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(result_i)
}

# Replicating this function twice, we can see `ref` is updated as expected.    
lapply(1:3, function(i) {

  replicate(5, genResult2(refEnv$ref), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

ref1

Thanks for your answer @SeGa, I never considering using `new.env()` and it's a useful thing to know for future :) however, I accepted the other answer here as it lets me avoid assignment in any environment which I think is easier for users to see what is going on when included in a package. — jogall, Jul 03 '18 at 09:50

Dynamically update input dataframe at each iteration of function without global assignment

2 Answers2