1

I am learning R and have written some code which produces the desired outputs but uses loops, which I would like to reply with some type of apply function.

I have a data frame, results, which stores outcomes of matches of a two-player game.

Player1   Player2   Result
Alice     Bob       Win
Charlie   Dennis    Win
Elena     Frank     Loss
...

And another, scores, which stores each player's overall score.

Player    Score
Alice     1200
Charlie   1200
Frank     1200
Bob       800
Dennis    800
Elena     800
...

The way the scores are calculated is using a function which takes in a single result, the current scores, and returns a new data frame with the new scores.

updateScores <- function(result, scores) {
  [ Code that calculates new scores based on a single result ]
  return(scores)
}

Now the problem is that I want to loop through the results and update the scores table. This is trivial with a for loop:

for(i in 1:nrow(results)) {
  scores <- updateScores(results[i, ], scores)
}

But I am struggling to see how I can do this with apply, mapply, or any other functional way which avoids the loop. This is because the scores variable needs to be passed to updateScores and updated with every iteration. (In Lisp I would use the reduce function but in R, Reduce() doesn't work the same way.)

Frostbitten
  • 31
  • 1
  • 5
  • It would help to know what exactly you are trying to do and what you expect, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bulat Sep 28 '19 at 18:22
  • Basically you could do this with `sapply(1:nrow(results), function(x)updateScores(results[i, ])` assuming the updating is independent of previous scores and calculations. But there is likely a trivial solution using vectorized functions avoiding `apply` functions altogether. We are however unable to help with this, as we do not know the intricates of the `updateScores` function. – Oliver Sep 28 '19 at 18:33
  • @Oliver The `updateScoes` function calculates the Elo rating of the two players involved in the single game passed as an argument. But in general, assuming the function in question isn't a simple one, would you use `sapply` here? – Frostbitten Sep 28 '19 at 18:47
  • @Bulat What am I missing? The objective is to do exactly what I have done with the for loop, on the data given, but without using a for loop. I am coming from an OOP background and I'm doing this exercise to understand functional programming in R better. – Frostbitten Sep 28 '19 at 18:53
  • In general you are not going to gain speed from using `*apply` functions over an optimized `for loop`, outside parallelization of your calculations. If your calculations are always using 2 players, there is however very much the chance, that you could split your data into `player 1 vs player 2` data styles, and perform the entire operation for these splits of your data, and gain quite a speed improvement. Basically convert your data into 'nice' formats first, and then perform calculations efficiently. – Oliver Sep 28 '19 at 18:53
  • A preview of your data using `dput` might be helpful, in this situation. Is the reason for the question code that is running slow? – Oliver Sep 28 '19 at 18:57

2 Answers2

1

From the comments we learned that the main problem of the question is calculating Elo ratings. Without much more info, i would expect the problem is optimizing code for speed.

Rather than using an apply function over a for loop a great speed improvement can be achieved, by first converting ratings into player 1 vs player 2 structures and converting code to use vectorized computations. Take the example implementation below:

Probability <- function(R1, R2)
    1 / (1 + 10^((R1 - R2)/400))
EloRating <- function(R1, R2, K, d){
    P1 <- Probability(R2, R1)
    P2 <- Probability(R1, R2)
    index <- d == 1 #Which matches did Player 1 win?
    #Use that d is 0 and 1's, and !d is 1 and 0's (reverse of d)
    R1 <- R1 + K * (d  - P1) 
    R2 <- R2 + K * (!d  - P1) 
    #output updated ratings
    return(list(Rating1 = R1, Rating2 = R2))
}

Note the lack of for loop and apply functions such as sapply.

This piece of code is highly optimized, as Probability can take any 2 vectors and return a vector of same length, using only the build-in functions of R which are themselves implemented in optimized C or Fortran code.

The EloRating function itself is also very optimized. First we calculate the probability (under normality assumption i'm guessing?) of either player winning. Next I'm assuming that an input vector d of victories are given, for which d[i]==1 is indicative of player 1 winning the match against player 2, and any other result being the reverse.

I like having player 1 winning being marked as 1, and having player 2 winning marked as 0, so i use d==1 to ensure that d only contains 1's (TRUE) and 0's FALSE.

When I've ensured this i can abuse that a logical vector can be inverted by using !d, reversing which elements are 1's and which are 0's. Thus i can perform all the rating changes simultaneously using only 2 lines of code.

For a bit of speed testing, let's run a microbenchmark for 1000 observed matches

set.seed(1)
n <- 1000
R1 <- rnorm(n, 1000, 50)
R2 <- rnorm(n, 1000, 50)
wins <- sample(1:2, n, replace = TRUE)
microbenchmark::microbenchmark(EloRating(R1, R2, 40, wins), times = 1e3)
#output
Unit: microseconds
                        expr     min     lq     mean  median      uq    max neval
 EloRating(R1, R2, 40, wins) 289.983 291.87 305.2853 293.381 309.239 626.03  1000

Note the median time is 2.93 * 1e-6 seconds per iteration for 1000 players.

Disclaimer

I have basically just converted code from GeeksforGeeks without any thought for it's correctness. I take no responsibility for any errors in the code, and i suggest checking against known results before using it.

Community
  • 1
  • 1
Oliver
  • 8,169
  • 3
  • 15
  • 37
  • Usually in a tie 0.5 points is given to each player, so `d` can't be expected to be binary. – AkselA Sep 28 '19 at 20:42
  • That is good to know. I simply just read the wiki and converted an example code on geeksforgeeks, so it is by all means not meant as the perfect example. :-) – Oliver Sep 28 '19 at 21:11
  • Thanks for the explanation, really appreciate you going into detail instead of just giving a quick solution. – Frostbitten Sep 28 '19 at 21:19
  • Glad i could help with the understanding. Remember to throw an upvote for helpful answers and the answer marked, to help others find which answers provided guides, and to incentivize future answers. :-) – Oliver Sep 28 '19 at 21:42
1

Similar idea to Oliver, as in no loops, but adapted to the existing data structure.

Example data

results <- read.table(text="
Player1   Player2   Result
Alice     Bob       Win
Charlie   Dennis    Win
Elena     Frank     Loss
June      Rashida   Tie", header=TRUE, stringsAsFactors=FALSE)

scores <- read.table(text="
Player    Score
Alice     1200
Charlie   1200
Frank     1200
Bob       800
Dennis    800
Elena     800
June      900
Rashida   1100", header=TRUE, stringsAsFactors=FALSE)

Functions

# expected score
exps <- function(ra, rb) {
    d <- (rb - ra)/400
    1/(1 + 10^d)
}

# update ratings
updrmulti <- function(ra, rb, score, k=16) {
    if (NCOL(ra) == 3) {
        rb <- ra[,2]
        score <- ra[,3]
        ra <- ra[,1]
    }
    ea <- exps(ra, rb)
    eb <- 1 - ea
    sa <- score
    sb <- 1 - sa
    cbind(ra=ra + k*(sa - ea), rb=rb + k*(sb - eb))
}

Implementation

# set Player names as rownames. Makes look-up easier
rownames(scores) <- scores$Player

# copy results data.frame
r2 <- results

# recode results to numeric
r2$Result <- (match(r2$Result, c("Loss", "Tie", "Win"))-1)/2

# replace Player names with their respective ratings
r2[,1:2] <- scores[as.matrix(r2[,1:2]), 2]

# pass new ratings+score data.frame to ratings-update function
r2u <- updrmulti(r2)

# cast updated ratings to new data.frame and give appropriate rownames
scores.new <- data.frame(Score=c(r2u))
rownames(scores.new) <- as.matrix(results[,1:2])

# order rows by rownames of original data.frame
scores.new[rownames(scores),, drop=FALSE]
#             Score
# Alice   1201.4545
# Charlie 1201.4545
# Frank   1201.4545
# Bob      798.5455
# Dennis   798.5455
# Elena    798.5455
# June     904.1560
# Rashida 1095.8440
AkselA
  • 8,153
  • 2
  • 21
  • 34