Overview
I think you have two main options about the logic behind the problem:
- For each option o in 1-26, build an index matrix
draft_simulations == o
, then rowSums
it;
- For each player p in 1-26, table its row
table(draft_simulations[p,])
.
On top of that choice, I'd say you have two main possible efficiency gains:
- Use better functions;
- Use a better looping structure.
Better functions
For (1.) I think rowSums
and ==
are about as good as it gets. But for table
, there are a lot of different options, as you can see on Is there an efficient alternative to table()?. I'd recommend reading all the answers to that question, as each have their pros and cons, but here, I'll stick to base::tabulate
.
Better looping structure
You have for loops, the base::apply
family, and purrr::map
. Is not obvious which to use, so I'll test below.
Testing
I tested options using rowSums
, table
, and tabulate
, with all the looping options (code at the end), which yielded:
bench::mark(rowsums_for(), rowsums_sapply(), rowsums_map(),
table_for(), table_apply(),
tabulate_for(), tabulate_apply(),
check = FALSE, min_time = 1) %>%
select(expression:`gc/sec`)
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 rowsums_for() 3.08ms 4ms 260. 2.59MB 9.04
2 rowsums_sapply() 3.14ms 4.13ms 251. 2.6MB 9.02
3 rowsums_map() 3.27ms 4.3ms 241. 2.59MB 9.01
4 table_for() 4.41ms 4.73ms 208. 1.87MB 5.33
5 table_apply() 4.59ms 4.96ms 200. 1.98MB 5.33
6 tabulate_for() 231.2µs 250.7µs 3788. 210.94KB 11.9
7 tabulate_apply() 289.2µs 311µs 3090. 312.8KB 11.9
So, at least for the dimension of your problem, for loops are better. Also, rowSums
is better than table
, but tabulate
is the clear winner. This isn't necessarily true if the dimensions are different.
Functions:
plays <- 26
players <- 26
n_sims <- 1000
random_probs <- rbeta(plays, 10, 20)
random_probs <- random_probs/sum(random_probs)
draft_simulations <-
replicate(n_sims, sample(1:plays, players, replace = FALSE, prob = random_probs)) %>%
matrix(nrow = players, ncol = n_sims)
rowsums_for <- function(){
result <- matrix(nrow = players, ncol = plays)
for(i in 1:plays){
result[,i] <- rowSums(draft_simulations == i)
}
}
rowsums_sapply <- function(){
sapply(1:players, function(x) rowSums(draft_simulations == x))
}
rowsums_map <- function(){
map(1:players, ~ rowSums(draft_simulations == .x))
}
table_for <- function(){
result <- matrix(nrow = players, ncol = plays)
for(i in 1:players){
result[i,] <- table(draft_simulations[i,])
}
}
table_apply <- function(){
apply(draft_simulations, 1, table)
}
tabulate_for <- function(){
result <- matrix(nrow = players, ncol = plays)
for(i in 1:players){
result[i,] <- tabulate(draft_simulations[i,])
}
}
tabulate_apply <- function(){
apply(draft_simulations, 1, tabulate)
}