2

I am adjusting a random forest with a single numeric variable.

The structure of the data table is as follows: tibble [617,622 x 29] (S3: tbl_df/tbl/data.frame) and I split the data:

set.seed(123)
data_split <- initial_split(data, strata = var_class, prop = .70)
data_train <- training(data_split )
data_test <- as.data.frame(testing(data_split ))

With the following recipe object and workflow:

rec_1v_s <- data_train %>% 
  recipe(var_class ~ var1) %>%
  step_naomit(everything(), skip = TRUE) %>% 
  step_normalize(all_numeric()) %>%
  step_smote(var_class)

model_to_tune <- rand_forest(mode = "classification",
                                 trees = tune())%>% set_engine("ranger")
wflow_rf_1v <-
  workflow() %>%
  add_model(model_to_tune) %>%
  add_recipe(rec_1v_s )

I want to tune the trees.

set.seed(123)
rf_grid <- grid_latin_hypercube(
  trees(),
  size = 40)
race_ctrl <-
  control_race(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = FALSE
  )

Then I tuning the trees of random forest engine:

tictoc::tic()
all_cores <- parallel::detectCores(logical = FALSE)
library(doFuture)
registerDoFuture()
cl <- parallel::makeCluster(all_cores-4)
plan(cluster, workers = cl)

# Option 1 tune_race

rf_tune_race <- wflow_rf_1v %>% 
  tune_race_win_loss(resamples = folds,
            grid = rf_grid,
            control = race_ctrl,
            metrics = metric_set(roc_auc, accuracy))

# Option 2 tune_grid
rf_tune_grid <- wflow_rf_1v %>% 
  tune_grid(resamples = folds,
            grid = rf_grid,
            control = race_ctrl,
            metrics = metric_set(roc_auc, accuracy))
tictoc::toc()

With the same specification, if I run a tune_grid no errors are generated and I have a result but if I run tune_race (anova or win_loss) I get the following error:

Error: arrange() failed at implicit mutate() step. x Can't recycle input of size 0 to size 1

The error persist with tune_race_anova and tune_race_loss_win

The error does not provide much information and I cannot detect where it comes from.

Just in case it is helpful I add the detail of the error provided by rlang

rlang::last_error()
<error/dplyr_error>
arrange() failed at implicit mutate() step. 
x Can't recycle input of size 0 to size 1.
Backtrace:
Run `rlang::last_trace()` to see the full context.

rlang::last_trace()
<error/dplyr_error>
arrange() failed at implicit mutate() step. 
x Can't recycle input of size 0 to size 1.
Backtrace:
     x
  1. +-`%>%`(...)
  2. +-finetune::tune_race_win_loss(...)
  3. +-finetune:::tune_race_win_loss.workflow(...)
  4. | \-finetune:::tune_race_win_loss_workflow(...)
  5. |   \-`%>%`(...)
  6. +-tune::tune_grid(...)
  7. +-tune:::tune_grid.workflow(...)
  8. | \-tune:::tune_grid_workflow(...)
  9. |   \-tune:::tune_grid_loop(...)
 10. |     \-tune:::pull_metrics(resamples, results, control)
 11. |       \-tune:::pulley(resamples, res, ".metrics")
 12. |         +-dplyr::arrange(resamples, !!!syms(id_cols))
 13. |         \-dplyr:::arrange.data.frame(resamples, !!!syms(id_cols))
 14. |           \-dplyr:::arrange_rows(.data, dots)
 15. |             +-base::withCallingHandlers(...)
 16. |             +-dplyr::transmute(new_data_frame(.data), !!!quosures)
 17. |             \-dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
 18. |               +-dplyr::mutate(.data, !!!dots, .keep = "none")
 19. |               \-dplyr:::mutate.data.frame(.data, !!!dots, .keep = "none")
 20. |                 +-dplyr::dplyr_col_modify(.data, cols)
 21. |                 \-dplyr:::dplyr_col_modify.data.frame(.data, cols)
 22. |                   \-vctrs::vec_recycle_common(!!!cols, .size = nrow(data))
 23. +-vctrs:::stop_recycle_incompatible_size(...)
 24. | \-vctrs:::stop_vctrs(...)
 25. |   \-rlang::abort(message, class = c(class, "vctrs_error"), ...)
 26. |     \-rlang:::signal_abort(cnd)
 27. |       \-base::signalCondition(cnd)
 28. \-(function (cnd) ...
  • I try with different way to specify the grid then it worked but i don't understand why: 1) get error: `rf_grid <- grid_latin_hypercube( trees(), size = 40)`; 2) not get error: `rf_race <- wflow_rf_D %>% parameters() %>% grid_latin_hypercube(size = 25)` – Joselina Davyt-Colo Oct 18 '21 at 22:24
  • I suspect this is happening because of how you are using `step_naomit()`; check out [this info on row sampling steps](https://www.tmwr.org/recipes.html#skip-equals-true). What happens if you omit `NA` data _ahead of time_ and remove that step? – Julia Silge Oct 22 '21 at 22:31
  • I found a similar error in a similar context. Did you succeed in finding a solution to this, @JoselinaDavyt-Colo ? – Marc Kees May 07 '22 at 23:36

0 Answers0