R - Running time by line of code in dplyr

Question

When I want to estimate the running time of a R code, I use the function system.time().

library(dplyr)

system.time({
    Titanic %>%
        as.data.frame() %>%
        mutate(Dataset = 1) %>%
        bind_rows(as.data.frame(Titanic)) %>%
        mutate_all(funs(replace_na(., NA))) %>% 
        filter(Dataset != 1)
})

# utilisateur     système      écoulé 
#        0.02        0.00        0.02

Question: Is there a way to know the running time of each operations, operations between each pipe (the mutate, then the bind_rows, then the filter, etc.) without running each one by one or without writing several system.time() ?

In this example it is not useful, but sometimes I received a long script, with a long running time, and I would like to identify which operations are the lowest.

I made some research but I didn't find something useful.

Possibly related: https://stackoverflow.com/questions/30119628/stepping-through-a-pipeline-with-intermediate-results — MrFlick, Apr 05 '19 at 20:16
I like the `tictoc` library for this. Add a tictoc:tic("Step 1") to start a clock and tictoc::toc() to end it. Or if you want to get fancier, https://rstudio.github.io/profvis/ — Jon Spring, Apr 05 '19 at 20:19
Learn more about profiling: https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio — F. Privé, Apr 06 '19 at 05:28

score 8 · Accepted Answer · answered Apr 08 '19 at 08:39

8

You might be interested into the %L>% pipe from my package pipes :

# devtools::install_github("moodymudskipper/pipes")
library(pipes)
Titanic %L>%
  as.data.frame() %L>%
  mutate(Dataset = 1) %L>%
  bind_rows(as.data.frame(Titanic)) %L>%
  mutate_all(list(~replace_na(., NA))) %L>% 
  filter(Dataset != 1)

# as.data.frame(.)   ~  0.03 sec
# mutate(., Dataset = 1)   ~  0 sec
# bind_rows(., as.data.frame(Titanic))   ~  0 sec
# mutate_all(., list(~replace_na(., NA)))   ~  0 sec
# filter(., Dataset != 1)   ~  0.03 sec
# [1] Class    Sex      Age      Survived Freq     Dataset 
# <0 rows> (or 0-length row.names)

answered Apr 08 '19 at 08:39

moodymudskipper

46,417
11
121
167

Awesome ! One question about `%L>%` : is it increasing the running time of the code (by calculating the different times and printing the result) or not soo much ? Let's say I have a long piping code, that I run several times. I can use `%L>%` one time to see which part of this code is expensive, and then remove the `%L>%`, or I can keep it and run it each time, it will not have such impact ? – demarsylvain Apr 09 '19 at 18:13
You can print `%L>%` (surrounded by backticks) and you will see what additional code is used, it is some basic string manipulation and a call to `system.time`, so it should not have a significant impact, and certainly not next to an expensive call. – moodymudskipper Apr 09 '19 at 18:29
The package seems unavailable (at least deprecated) https://github.com/moodymudskipper/pipes – Julien Apr 12 '23 at 07:37
The replacement seems to be `remotes::install_github("moodymudskipper/fastpipe")` – Julien Apr 12 '23 at 07:46
I did a lot of pipe packages and I didn't maintain them very well. I'm not touching those anymore so you can trust that whatever you have now will stay. See also my {boomer} package for this kind of things. {fastpipe} is actually much slower than latest {magrittr} versions and the base pipe has no overhead so should be favoured if speed is crucial. – moodymudskipper Apr 12 '23 at 08:06

DJV · Answer 2 · 2019-04-05T20:37:09.713

4

You can use the package profvis:

library(tidyverse)    
library(profvis)

profvis({
  Titanic %>%
    as.data.frame() %>%
    mutate(Dataset = 1) %>%
    bind_rows(as.data.frame(Titanic)) %>%
    mutate_all(funs(replace_na(., NA))) %>% 
    filter(Dataset != 1)
})

edited Apr 05 '19 at 20:37

answered Apr 05 '19 at 20:28

DJV

4,743
3
19
34

maybe i missed something, but i'm not sure the function `profvis` responds exactly to my need, it didn't give me the running time per line of pipe-code. Using `%L>%` seems to be more appropriate. – demarsylvain Apr 09 '19 at 18:03

tubaguy · Answer 3 · 2020-02-21T13:27:52.813

Here's one option that worked for me (editing your NA replacement since funs is soft deprecated)...admittedly, it's pretty lengthy:

library(dplyr)
library(magrittr)
library(tictoc)

Titanic %T>%
  {tic("as.data.frame")} %>%
  as.data.frame() %T>%
  {toc(); tic("mutate")} %>%
  mutate(Dataset = 1) %T>%
  {toc(); tic("bind.rows")} %>%
  bind_rows(as.data.frame(Titanic)) %T>%
  {toc(); tic("replace.na")} %>%
  replace(is.na(.), 0) %T>% 
  {toc(); tic("filter")} %>%
  filter(Dataset != 1) %T>%
  {toc(); tic("head")} %>%
  head() %T>%
  {toc()}

as.data.frame: 0 sec elapsed
mutate: 0 sec elapsed
bind.rows: 0 sec elapsed
replace.na: 0 sec elapsed
filter: 0 sec elapsed
head: 0 sec elapsed
  Class    Sex   Age Survived Freq Dataset
1   1st   Male Child       No    0       0
2   2nd   Male Child       No    0       0
3   3rd   Male Child       No   35       0
4  Crew   Male Child       No    0       0
5   1st Female Child       No    0       0
6   2nd Female Child       No    0       0

R - Running time by line of code in dplyr

3 Answers3