Stepping through a pipeline with intermediate results

Question

Is there a way to output the result of a pipeline at each step without doing it manually? (eg. without selecting and running only the selected chunks)

I often find myself running a pipeline line-by-line to remember what it was doing or when I am developing some analysis.

For example:

library(dplyr)

mtcars %>% 
  group_by(cyl) %>% 
  sample_frac(0.1) %>% 
  summarise(res = mean(mpg))
# Source: local data frame [3 x 2]
# 
# cyl  res
# 1   4 33.9
# 2   6 18.1
# 3   8 18.7

I'd to select and run:

mtcars %>% group_by(cyl)

and then...

mtcars %>% group_by(cyl) %>% sample_frac(0.1)

and so on...

But selecting and CMD/CTRL+ENTER in RStudio leaves a more efficient method to be desired.

Can this be done in code?

Is there a function which takes a pipeline and runs/digests it line by line showing output at each step in the console and you continue by pressing enter like in demos(...) or examples(...) of package guides

Check out R's `debug()` function. It is close to what you want. You could use it with the `print()` statements. This post on [Cross Validated](http://stats.stackexchange.com/questions/13535/running-an-r-script-line-by-line) talks more about it. — Richard Erickson, May 08 '15 at 13:01
You can simply use `%>% print() %>%` - see this answer: https://stackoverflow.com/a/54075410/5535152 — Emy, Apr 25 '21 at 14:02

score 10 · Answer 1 · answered Jan 07 '17 at 17:48

10

You can select which results to print by using the tee-operator (%T>%) and print(). The tee-operator is used exclusively for side-effects like printing.

# i.e.
mtcars %>%
  group_by(cyl) %T>% print() %>%
  sample_frac(0.1) %T>% print() %>%
  summarise(res = mean(mpg))

answered Jan 07 '17 at 17:48

seasmith

889
9
16

2

When the output is a dataframe I find it useful to use `%T>% View() %>%` to see the intermediate results – see24 Jul 11 '18 at 13:44

score 3 · Accepted Answer · answered May 08 '15 at 13:36

It is easy with magrittr function chain. For example define a function my_chain with:

foo <- function(x) x + 1
bar <- function(x) x + 1
baz <- function(x) x + 1
my_chain <- . %>% foo %>% bar %>% baz

and get the final result of a chain as:

     > my_chain(0)
    [1] 3

You can get a function list with functions(my_chain) and define a "stepper" function like this:

stepper <- function(fun_chain, x, FUN = print) {
  f_list <- functions(fun_chain)
  for(i in seq_along(f_list)) {
    x <- f_list[[i]](x)
    FUN(x)
  }
  invisible(x)
}

And run the chain with interposed print function:

stepper(my_chain, 0, print)

# [1] 1
# [1] 2
# [1] 3

Or with waiting for user input:

stepper(my_chain, 0, function(x) {print(x); readline()})

score 2 · Answer 3 · answered May 08 '15 at 08:56

2

Add print:

mtcars %>% 
  group_by(cyl) %>% 
  print %>% 
  sample_frac(0.1) %>% 
  print %>% 
  summarise(res = mean(mpg))

answered May 08 '15 at 08:56

zx8754

52,746
12
114
209

I get that print returns it's argument and so this works but it's not really shorter/faster/more convenient than just hand selecting and running chunks. – andrew wong May 08 '15 at 09:27
@andrewwong Tell us more, why would you need to run it line by line, more importantly why would you want to look at print output one by one? – zx8754 May 08 '15 at 09:39
1

updated question. I want like an interactive stepper in the console or an auto-magic markdown document with the intermediates all generated. thanks for your thoughts! – andrew wong May 08 '15 at 09:48

score 2 · Answer 4 · edited May 23 '17 at 12:33

IMHO magrittr is mostly useful interactively, that is when I am exploring data or building a new formula/model.

In this cases, storing intermediate results in distinct variables is very time consuming and distracting, while pipes let me focus on data, rather than typing:

x %>% foo
## reason on results and 
x %>% foo %>% bar
## reason on results and 
x %>% foo %>% bar %>% baz
## etc.

The problem here is that I don't know in advance what the final pipe will be, like in @bergant.

Typing, as in @zx8754,

x %>% print %>% foo %>% print %>% bar %>% print %>% baz

adds to much overhead and, to me, defeats the whole purpose of magrittr.

Essentially magrittr lacks a simple operator that both prints and pipes results.
The good news is that it seems quite easy to craft one:

`%P>%`=function(lhs, rhs){ print(lhs); lhs %>% rhs }

Now you can print an pipe:

1:4 %P>% sqrt %P>% sum 
## [1] 1 2 3 4
## [1] 1.000000 1.414214 1.732051 2.000000
## [1] 6.146264

I found that if one defines/uses a key bindings for %P>% and %>%, the prototyping workflow is very streamlined (see Emacs ESS or RStudio).

score 2 · Answer 5 · answered Apr 09 '19 at 19:18

I wrote the package pipes that can do several things that might help :

use %P>% to print the output.
use %ae>% to use all.equal on input and output.
use %V>% to use View on the output, it will open a viewer for each relevant step.

If you want to see some aggregated info you can try %summary>%, %glimpse>% or %skim>% which will use summary, tibble::glimpse or skimr::skim, or you can define your own pipe to show specific changes, using new_pipe

# devtools::install_github("moodymudskipper/pipes")
library(dplyr)
library(pipes)

res <- mtcars %P>% 
  group_by(cyl) %P>% 
  sample_frac(0.1) %P>% 
  summarise(res = mean(mpg))
#> group_by(., cyl)
#> # A tibble: 32 x 11
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows
#> sample_frac(., 0.1)
#> # A tibble: 3 x 11
#> # Groups:   cyl [3]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  26       4  120.    91  4.43  2.14  16.7     0     1     5     2
#> 2  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#> 3  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#> summarise(., res = mean(mpg))
#> # A tibble: 3 x 2
#>     cyl   res
#>   <dbl> <dbl>
#> 1     4  26  
#> 2     6  17.8
#> 3     8  18.7

res <- mtcars %ae>% 
  group_by(cyl) %ae>% 
  sample_frac(0.1) %ae>% 
  summarise(res = mean(mpg))
#> group_by(., cyl)
#> [1] "Attributes: < Names: 1 string mismatch >"                                              
#> [2] "Attributes: < Length mismatch: comparison on first 2 components >"                     
#> [3] "Attributes: < Component \"class\": Lengths (1, 4) differ (string compare on first 1) >"
#> [4] "Attributes: < Component \"class\": 1 string mismatch >"                                
#> [5] "Attributes: < Component 2: Modes: character, list >"                                   
#> [6] "Attributes: < Component 2: Lengths: 32, 2 >"                                           
#> [7] "Attributes: < Component 2: names for current but not for target >"                     
#> [8] "Attributes: < Component 2: Attributes: < target is NULL, current is list > >"          
#> [9] "Attributes: < Component 2: target is character, current is tbl_df >"
#> sample_frac(., 0.1)
#> [1] "Different number of rows"
#> summarise(., res = mean(mpg))
#> [1] "Cols in y but not x: `res`. "                                                                
#> [2] "Cols in x but not y: `qsec`, `wt`, `drat`, `hp`, `disp`, `mpg`, `carb`, `gear`, `am`, `vs`. "

res <- mtcars %V>% 
  group_by(cyl) %V>% 
  sample_frac(0.1) %V>% 
  summarise(res = mean(mpg))
# you'll have to test this one by yourself

Stepping through a pipeline with intermediate results

5 Answers5

Linked