2
pass_through <- function(data, fun) {fun(data); data}

#from Printing intermediate results without breaking pipeline in tidyverse answer

mtcars %>% filter(mpg>15) %>% pass_through(. %>% nrow %>% print)

From the code above, I can print the number of rows of the data after filtering. But I cannot print the difference of number of rows between the original data and the data after filtering.

> mtcars %>% filter(mpg>15) %>% pass_through(. %>% nrow %>% print(.-nrow(mtcars)))

Error in print.default(., . - nrow(mtcars)) : invalid printing digits -6

Question 1: Are there any ways to check the difference without using any extra variables and breaking pipeline?

Question 2: Are there any ways to check the difference between 'n'th pipeline and 'n+1'th pipeline without using any extra variables and breaking pipeline?

For example, by using the code from Gregor Thomas,

mtcars %>%
  filter(mpg > 30) %T>%  #let this output to be y
  {\(x) (nrow(mtcars) - nrow(x)) %>% print}() %>% 
  filter(cyl  > 5)  %T>%
  {\(x) (nrow(y) - nrow(x)) %>% print}() 
#I know it is illegal to write 'y' 
doraemon
  • 439
  • 3
  • 10

2 Answers2

3

I'd suggest using the magrittr %T>% "tee" pipe for the pass-through, with an anonymous function expression:

library(magrittr)

mtcars %>%
  filter(mpg > 30) %T>%
  {\(x) (nrow(mtcars) - nrow(x)) %>% print}()
# [1] 28
#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
# Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
# Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
# Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you for your reply. Would you mind to answer one more question? Are there any ways to check the difference between 'n'th pipeline and 'n+1'th pipeline? – doraemon Dec 09 '22 at 03:19
  • 2
    I think you might want to check out the [tidylog package](https://github.com/elbersb/tidylog). You could maybe work out some function that wraps each step of the pipeline, but that sounds really clunky. And I think `tidylog` does pretty much what you're after. – Gregor Thomas Dec 09 '22 at 03:25
2

You may change the function as following -

library(dplyr)

filter_pass_through <- function(data, ...) {
  res <- filter(data, ...)
  print(nrow(data) - nrow(res))
  res
}

mtcars %>% filter_pass_through(mpg>15) 

#[1] 6
#                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#...
#...
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for your answer^^ What if I want to compare the difference of rows between 2nd pipeline and 3rd pipeline? – doraemon Dec 09 '22 at 03:13