0

Is it possible to monitor the progress of a vectorized operation in R? E.g. in a loop one can always do if (i %% 10000) print(i) to see which element the code is currently working on. My gut feeling is "probably not", but may be I'm wrong?

Alexey Ferapontov
  • 5,029
  • 4
  • 22
  • 39
  • Exact code doesn't matter, it's the general concept. Let it be `gsub("hi","lo",vector)`, where `vector = rep("hi",1000000)` – Alexey Ferapontov Jun 09 '16 at 18:04
  • 2
    I would say no like you since the looping in vectorized functions is done at the source code level, and vectorization (oversimplified) is passing one chunk of data and getting one chunk back; with for loops, you pass one chunk for each iteration so you can count the number of chunks (n=10000) but for vectorized operations, you don't have that amount of granularity (n = 1) – rawr Jun 09 '16 at 18:13

1 Answers1

5

In my comment, I asked what your code is and how you achieve vectorization. I think this matters. Although generally speaking, vectorization is achieved by using loops in compiled code, I am not entirely sure of this. Therefore, I would like to be less confident in saying "absolutely no".

However, if you want to track progress at R level, you must be able to get an index, like i used in an R level for loop. Now, let's check what most R vectorized functions look like:

> grep
function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, 
    fixed = FALSE, useBytes = FALSE, invert = FALSE) 
{
    if (!is.character(x)) 
        x <- structure(as.character(x), names = names(x))
    .Internal(grep(as.character(pattern), x, ignore.case, value, 
        perl, fixed, useBytes, invert))
}
<bytecode: 0xa34dfe0>
<environment: namespace:base>

> gsub
function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE, 
    fixed = FALSE, useBytes = FALSE) 
{
    if (!is.character(x)) 
        x <- as.character(x)
    .Internal(gsub(as.character(pattern), as.character(replacement), 
        x, ignore.case, perl, fixed, useBytes))
}

In above examples, we see that those vectorized R functions are merely a thin wrapper of compiled code (see the .Internal()). There are no explicit loop index for you to refer to. Hence for those example functions, tracking progress is not possible.

I suggest you have a look at the particular function you used. That is the best way to convince yourself.


follow up

Originally, I put lapply in my examples:

> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<bytecode: 0x9c5c464>
<environment: namespace:base>

Then @RichardScriven expressed his view of *apply family. On stack overflow, these two posts/answers are extremely useful to understanding vectorization issues in R:

Truly, though lapply calls C code to do the loop, it has to evaluate R function FUN along the loop. Hence:

  • if FUN dominates execution time, then lapply will not have noticeable advantage over R's for loop.
  • if FUN does so little work, that the loop overhead dominates the execution, then lapply will have noticeable advantage over R's for loop, because for loop in C is more "light weighted".

Discussing the performance of lapply is off-topic in this post, so I will not attach examples for demonstration.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248