I'm trying to understand purrr and I'm currently working with pmap. pmap
can be used to call a predefined function and uses the values in a dataframe as arguments to the function call. I would like to know what the current state is, as my data.frames might have several 1000s of rows.
How can I print the current line, that pmap
is running on, possibly together with the total length of the data.frame?
I tried to include a counter like in a for loop and also tried to capture the current row using
current <- data.frame(...)
and then row.names(current)
(idea from here: https://blog.az.sg/posts/map-and-walk/)
but in both cases it always prints 1
.
Thanks for helping.
For reproducibilty let's use the code from the question that brought me to purrr:::pmap
(How to use expand.grid values to run various model hyperparameter combinations for ranger in R):
library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
Trees = c(10, 20),
Importance = c("none", "impurity"),
Classification = TRUE,
Repeats = 1:5,
Target = "Species")
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
RF_train <- ranger(
dependent.variable.name = Target,
data = Input_list[[Input_table]], # referring to the named object in the list
num.trees = Trees,
importance = Importance,
classification = Classification) # otherwise regression is performed
data.frame(Prediction_error = RF_train$prediction.error,
True_positive = RF_train$confusion.matrix[1])
}
# the pmap call using a row of hyper_grid and the function in parallel
hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
I tried two things:
counter <- 0
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
counter <- counter + 1
print(paste(counter, "of", nrow(hyper_grid)))
# rest of the function
}
# and also
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
current <- data.frame(...)
print(paste(row.names(current), "of", nrow(hyper_grid)))
# rest of the function
}
# both return
> hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
[1] "1 of 40"
[1] "1 of 40"
[1] "1 of 40"
...