0

I have a function which needs a long time to run. So, I want to know how many rows of my data frame are processed. Usually, we can define a variable in for loop to deal with this easily. But I do not know how to do it in dplyr.

Let's say the code is:

library(tidyverse)

myFUN <-functin (x) {
  x + 1
}

a <- tibble(id=c(1:3),x=c(3,5,1))

a1 <- a %>%
  rowwise() %>%
  mutate(y=myFUN(x))

I hope in somewhere the code, I can define a variable i. The value will be plus 1 every time one row is processed, then print its values in console like:

1
2
3
halfer
  • 19,824
  • 17
  • 99
  • 186
Feng Chen
  • 2,139
  • 4
  • 33
  • 62
  • As per past comments, when asking questions, we would rather they were written in a technical and neutral style. Chatty material that begs, pleads and implores readers for an answer may be thought of as somewhat coercive, and is not really appropriate for the volunteer audience. Please keep it succinct. Thank you! – halfer Nov 13 '19 at 11:52

1 Answers1

2

Can you pass another variable to the function which would be the row number of the dataframe and print it in the function. Something like :

myFUN <-function (x, y) {
   message(y)
   x + 1
}

and then use

library(dplyr)
a %>%  mutate(y = purrr::map2_dbl(x, row_number(), myFUN))

#1 
#2 
#3 
# A tibble: 3 x 3
#     id     x     y
#  <int> <dbl> <dbl>
#1     1     3     4
#2     2     5     6
#3     3     1     2

If your function is vectorized, you can let go map_dbl and do

a %>% mutate(y= myFUN(x, seq_len(n())))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Good answer---much easier to modify the user function to do this than to try to mess with `dplyr` internals. I would suggest `message` rather than `cat`. – Gregor Thomas Nov 12 '19 at 04:51
  • 2
    @Gregor Thanks, updated the answer :) . Is there any advantage/disadvantage of using `message` over `cat` ? – Ronak Shah Nov 12 '19 at 05:05
  • 1
    Using `message` makes it easy to disable. If you're in a situation where you *don't* want the messages, wrap it in `suppressMessages`, or `purrr::quietly` will capture the messages separately. In an Rmd, you can set chunk option `message = FALSE`. If you use `cat`, you have less flexibility, and disabling print statements using `cat` [takes more work/less efficient workarounds](https://stackoverflow.com/q/34208564/903061). There's a more complete treatment at [Why is message() a better choice than print() in R for writing a package?](https://stackoverflow.com/a/36700294/903061). – Gregor Thomas Nov 12 '19 at 18:19