0

I'm trying to understand and I'm currently working with . pmap can be used to call a predefined function and uses the values in a as arguments to the function call. I would like to know what the current state is, as my data.frames might have several 1000s of rows.

How can I print the current line, that pmap is running on, possibly together with the total length of the data.frame?

I tried to include a counter like in a for loop and also tried to capture the current row using

current <- data.frame(...)

and then row.names(current)

(idea from here: https://blog.az.sg/posts/map-and-walk/)

but in both cases it always prints 1.

Thanks for helping.

For reproducibilty let's use the code from the question that brought me to purrr:::pmap (How to use expand.grid values to run various model hyperparameter combinations for ranger in R):

library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris)  # let's assume these are different input tables

# the data.frame with the values for the function
hyper_grid <- expand.grid(
  Input_table = names(Input_list),
  Trees = c(10, 20),
  Importance = c("none", "impurity"),
  Classification = TRUE,
  Repeats = 1:5,
  Target = "Species")

# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  RF_train <- ranger(
    dependent.variable.name = Target, 
    data = Input_list[[Input_table]],  # referring to the named object in the list
    num.trees = Trees, 
    importance = Importance, 
    classification = Classification)  # otherwise regression is performed

  data.frame(Prediction_error = RF_train$prediction.error,
             True_positive = RF_train$confusion.matrix[1])
}

# the pmap call using a row of hyper_grid and the function in parallel
hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)

I tried two things:

counter <- 0
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  counter <- counter + 1
  print(paste(counter, "of", nrow(hyper_grid)))
  # rest of the function
}

# and also 
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  current <- data.frame(...)
  print(paste(row.names(current), "of", nrow(hyper_grid)))
  # rest of the function
}

# both return
> hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
[1] "1 of 40"
[1] "1 of 40"
[1] "1 of 40"
...
crazysantaclaus
  • 613
  • 5
  • 19
  • 4
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please show the code you actually tried. – MrFlick Mar 31 '20 at 16:07
  • @MrFlick: Yes I was a little lazy, I thought it might be a common task where you need no data. But I added an example and also the two things I tried – crazysantaclaus Mar 31 '20 at 19:46

1 Answers1

2

Since you are already using pmap, the easiest way would just be to pass the rownames as well.

You can do something like

hyper_grid$res <- purrr::pmap(cbind(hyper_grid, .row=rownames(hyper_grid)), fit_and_extract_metrics)

which just adds a .row vector with the row names. And then in your iterating function, you can do

fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ..., .row) {
  print(paste(.row, "of", nrow(hyper_grid)))
  # rest of the function
}

Notice that I added a .row parameter to the function to capture that new column we added.

Note that map() and walk() have versions that make getting the iterator a bit easier called imap() and iwalk(), but the pmap does not have an ipmap presumably because you have to do all the work building the list of parameters that it just makes sense to pass in the names or indexes that you want as well.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • ok this works perfectly, but there is quite a lot about purrr I have to figure out. Thanks for showing me how to feed more information into the function call! – crazysantaclaus Mar 31 '20 at 20:45