I've got some code looking like this:
library(stringi)
df_values <- data.frame(value = stri_rand_strings(n = 500,
length = 30))
df_keys <- tibble(key = sample(x = 1:500,
size = 25000,
replace = TRUE))
# start timer
start_time <- Sys.time()
df_keys |>
rowwise() |>
mutate(value = df_values$value[key])
# end timer
end_time <- Sys.time()
end_time - start_time
Which requires very much time to run, but I can't figure out why. The code above only requires 0.3003931 seconds. For my real code I subsetted the tibble with head(n)
and got following times:
n | time in secs |
---|---|
50 | 1.993536 |
100 | 3.731 |
200 | 6.550074 |
300 | 9.500864 |
500 | 15.68515 |
1,000 | 32.19306 |
... | seems to be linear |
20,000 | maybe 10 minutes |
Does someone have an idea what could be wrong with my code? I guess it's the indexing-part df_values$value[key]
? But my original df_values
also is a data.frame with 500 obs.