12

Is there a command to add to tidyverse pipelines that does not break the flow, but produces some side effect, like printing something out. The usecase I have in mind is something like this. In case of a pipeline

data %>%
  mutate(new_var = <some time consuming operation>) %>%
  mutate(new_var2 = <some other time consuming operation>) %>%
  ...

I would like to add some command to the pipeline that would not modify the end result, but would print out some progress or the state of things. Maybe something like this:

data %>%
  mutate(new_var = <some time consuming operation>) %>%
  command_x(print("first operation done")) %>%
  mutate(new_var2 = <some other time consuming operation>) %>%
  ...

Does there exist such command_x already?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Raivo Kolde
  • 729
  • 6
  • 14
  • 1
    Please use reproducible examples in your questions – Hack-R Sep 08 '17 at 19:22
  • 1
    Related https://stackoverflow.com/q/30119628/ Luke's answer there is the idiomatic way, I think. – Frank Sep 08 '17 at 19:33
  • The `%T>%` is almost what I'm looking for, but it would be nice to have a function that returns its first argument and as a second argument would take an expression on the data given in first, like other dplyr functions do. I think I saw something like that somewhere, but might be wrong. – Raivo Kolde Sep 08 '17 at 19:44
  • 2
    You could just write ```pipeable_command_x = function(df, other_args){command_x(other_args); return(df)}``` and use that – rsmith54 Sep 08 '17 at 20:25
  • 1
    Also look at the `tidylog` package which prints a status upon completion of each operation. – Jonas Lindeløv Feb 09 '21 at 16:59

3 Answers3

13

For the specific case of printing an intermediate step in the pipeline, just use %>% print() %>%. E.g.,

mtcars %>%
  filter(cyl == 4) %>%
  print() %>%
  summarise(mpg = mean(mpg))

For a simple status message, either library(tidylog) or do it manually:

pipe_message = function(.data, status) {message(status); .data}
mtcars %>%
  filter(cyl == 4) %>%
  pipe_message("first operation done") %>%
  select(cyl)

See the answer by @MrFlick for a more general solution for non-print functions.

Jonas Lindeløv
  • 5,442
  • 6
  • 31
  • 54
  • It works well! I don't understand why it's not built into the package. Can I ask you a question, why do you add a call to `data` after the `print(data)` statement? `pipe_print = function(data) {print(data)}` also works. – Emy Apr 20 '21 at 12:23
  • You are right! This simplifies this case a whole lot as you can see from my updated answer. It's clear now why it's not built into dplyr. – Jonas Lindeløv Apr 21 '21 at 07:49
8

You could easily write your own function

pass_through <- function(data, fun) {fun(data); data}

And use it like

mtcars %>% pass_through(. %>% ncol %>% print) %>% nrow

Here we use the . %>% syntax to create an anonymous function. You could also write your own more explicitly with

mtcars %>% pass_through(function(x) print(ncol(x))) %>% nrow
MrFlick
  • 195,160
  • 17
  • 277
  • 295
5

You can do on the fly with an anonymous function:

mtcars %>% ( function(x){print(x); return(x)} ) %>% nrow()
GitHunter0
  • 424
  • 6
  • 10