I am looking to generate a mean and variance value for every row in a numeric tibble. With my existing code, what I thought to be a very dplyr appropriate solution, it takes a number of hours to complete for 50,000 rows of about 35 columns.
Is there a way to speed up this operation using only dplyr? I know apply and purrr are options, but I am mostly curious if there is something about dplyr I'm overlooking when performing a large series of calculations like this.
Reproducible example:
library(tidyverse)
library(vroom)
gen_tbl(50000, cols = 40,
col_types = paste0(rep("d", 40), collapse = "")) %>%
rowwise() %>%
mutate(mean = mean(c_across()),
var = var(c_across()))
My suspicion lies with rowwise()
but I am interested if there is a more nuanced way to solving this with dplyr or if it is just not a problem dplyr is good at.