17

I am writing a function with several pipes. I would like to save some of the steps as .tbl or data frame before the last pipe. For instance: a %>% b %>% c, I would like to save the step 'c', but also want the step 'b'.

I know that one option is to do two pipes, but I believe that must have a better way.

cars %>% mutate(kmh = dist/speed) %>% summary()

MKR
  • 19,739
  • 4
  • 23
  • 33
Felipe Dalla Lana
  • 615
  • 1
  • 5
  • 12
  • 1
    Load magrittr and use `%T>%`? I'm not clear on what you mean by "save" here / what the desired result is. – Frank Apr 19 '18 at 16:18
  • The car example is just a generic example. In my real work, I have over 500k weather observation (every 15mim), so I want first summarise by hours and save, then by day and save, and finally by month. Each one of these outputs will be used in a different analysis – Felipe Dalla Lana Apr 19 '18 at 16:32
  • 2
    Why does it need to be all in one line? – cparmstrong Apr 19 '18 at 16:46

3 Answers3

31

Thanks for the help. I found a better solution using braces{} and ->>. See below

   c = cars %>% mutate(var1 = dist*speed)%>%
   {. ->> b } %>%   #here is save
   summary()
   c
   head(b)
Felipe Dalla Lana
  • 615
  • 1
  • 5
  • 12
4

Not sure why one will need it. But as @Frank suggested one option is to use %T>% operator (tee operator) from magrittr package along with assign function to store intermediate values.

In the below code the SummaryVal will have summary information of cars and MyValue will hold the intermediate value after mutate.

library(tidyverse)
library(magrittr)

SummaryVal <- cars %>% mutate(kmh = dist/speed) %T>% 
              assign("MyValue",.,envir = .GlobalEnv) %>% 
              summary()

head(MyValue)
#   speed dist       kmh
# 1     4    2 0.5000000
# 2     4   10 2.5000000
# 3     7    4 0.5714286
# 4     7   22 3.1428571
# 5     8   16 2.0000000
# 6     9   10 1.1111111

SummaryVal
#    speed           dist             kmh       
# Min.   : 4.0   Min.   :  2.00   Min.   :0.500  
# 1st Qu.:12.0   1st Qu.: 26.00   1st Qu.:1.921  
# Median :15.0   Median : 36.00   Median :2.523  
# Mean   :15.4   Mean   : 42.98   Mean   :2.632  
# 3rd Qu.:19.0   3rd Qu.: 56.00   3rd Qu.:3.186  
# Max.   :25.0   Max.   :120.00   Max.   :5.714 

UPDATED: As @Renu correctly pointed out even %>% will work as below:

SummaryVal <- cars %>% mutate(kmh = dist/speed) %>% 
              assign("MyValue",.,envir = .GlobalEnv) %>% 
              summary()
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 1
    Good solution. In this case `%T>%` is unnecessary since the result is the same even if you just use `%>%` – IceCreamToucan Apr 19 '18 at 18:13
  • @Renu Valid point. Indeed it will work. Though not sure why it works as `assign` doesn't return anything. Keeping that in mind I thought `%T>%` will be an good option. – MKR Apr 19 '18 at 18:21
  • It actually does. See `a <- assign('b', 4)`. Same concept as `a <- b <- 4` – IceCreamToucan Apr 19 '18 at 18:30
  • @Renu That's true. If you just run `assign('test, 2)` on command line nothing will be printed. Even documentation doesn't say about return value but it returns. – MKR Apr 19 '18 at 18:34
1

Lists and a function are the way to go. Makes debugging easy and is still readable. Here is a small example. You will need to include some error handling in the function to make sure the data you give to it is what you expect etc. The function will return a list with the results. Just in case you want to have separate data.frames instead of a big list, the last line of code pulls out all the data.frame from the list as separate data.frames.

library(dplyr)

# create a function
my_summaries <- function(x){
  # error handling goes here
  my_mutate <- x %>% mutate(kmh = dist/speed)
  my_summary <- my_mutate %>% summary()
  list(mutate = my_mutate, summary = my_summary)
}

my_data <- my_summaries(cars)

str(my_data)
List of 2
 $ mutate :'data.frame':    50 obs. of  3 variables:
  ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
  ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
  ..$ kmh  : num [1:50] 0.5 2.5 0.571 3.143 2 ...
 $ summary: 'table' chr [1:6, 1:3] "Min.   : 4.0  " "1st Qu.:12.0  " "Median :15.0  " "Mean   :15.4  " ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "" "" "" "" ...
  .. ..$ : chr [1:3] "    speed" "     dist" "     kmh"


# Unlist list of data.frames
list2env(my_data ,.GlobalEnv)
phiver
  • 23,048
  • 14
  • 44
  • 56