2

I am currently running some functions on large data sets for which each operation takes a long time to execute.

To see the progress of my calculations, it would be handy to print the iterations/percentage of completed calculations. With loops, this can be easily done.

However, is it possible to have something similar working for vectorized functions or or pre-defined functions without actually making changes to the source code of those functions?

Example data:

generate_string taken from here : Generating Random Strings

generate_string <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
x <- generate_string(10000)
y <- generate_string(10000)

Example function to be monitored:

(i.e. printing the percentage completed):

library(stringdist)
# amatch will find for each element in x the index of the most similar element in y
ind <- amatch(x,y, method = "jw", maxDist = 1)
Community
  • 1
  • 1
Fred
  • 410
  • 3
  • 12
  • 1
    So you want to hand out a task to function you have no control over, and while it is working "behind closed door", without interrupting it, having it report to you how it is progressing? – MrGumble Aug 15 '18 at 09:24
  • exactly (if that is possible at all...) – Fred Aug 15 '18 at 09:27
  • 1
    I was thinking of using `promises` to send work into a processor and then ping it once in a while to see if it's done. That's not progress bar, but it would indicate the process is live or at least still being calculated. Would be interested to see if there's a chance of implementing a progress bar for a function you have no access to. – Roman Luštrik Aug 15 '18 at 09:41
  • @RomanLuštrik my only idea currently would be to simply split the data into several chunks and apply the function on each chunk along with some progress information, but I was hoping there is some more elegant solution to it. – Fred Aug 15 '18 at 09:57
  • 1
    How about getting a sound instead of a progress bar? `beepr` could generate a sound and you don't have to look at the screen and do something else in the mean time. Of course you will not know where you are with the program. – phiver Aug 15 '18 at 10:07
  • If the process is such that you can split it, that would be a viable option. – Roman Luštrik Aug 15 '18 at 10:09

1 Answers1

1

The pbapply is a option, but is more slow than the direct call:

system.time({ind <- amatch(x,y, method = "jw", maxDist = 1)})
   user  system elapsed 
  27.79    0.05    9.72 

library(pbapply)
ind <- pbsapply(x, function(xi) amatch(xi,y, method = "jw", maxDist = 1))
 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 30s

Also, the option that you comment (split data in chunks) is less elegant but faster, and this is easily parallelizable.

library(progress)
system.time({
nloops <- 20
pp <- floor(nloops * (0:(length(x)-1))/length(x)) + 1
ind <- c()
pb <- progress_bar$new(total = nloops)
for(i in 1:nloops) {
  pb$tick()
  ind <- c(ind, amatch(x[pp == i],y, method = "jw", maxDist = 1))
}
pb$terminate()
})
[===================================================================================] 100%
   user  system elapsed 
  25.96    0.06    9.21