1

I am trying to figure out how to add multiple columns returned from a function which takes one or multiple columns from the same data frame as input - basically, I want mutate but with the option to left_join() a data frame. I can do this with either left_join() or cbind() but there must be a better way.

The actual code I am using is taking columns for revenues, costs, capital expenditures and other information and calculating tax and other fiscal policy output. Because they are interdependent, I need to do this all in one function (I can't do it one variable at a time) and I really don't want to call the same function multiple times and then mutate by columns (although I could do that too).

Here's a really simple example (contrived) of what I want to do:

library(tidyverse)
library(lubridate)
library(nycflights13)
#small data frame of new years day flights from JFK
df1<-flights %>% filter(year==2013,month==1,day==1,origin=="JFK")  
#test function
arr_gate_time<-function(time){
  dep<-time
  gate<-time-hours(1)
  check_in<-time-hours(2)
  data.frame(gate,check_in)
}

What I want to be able to do is, within mutate, do something like this:

df_test_2<- df1 %>% mutate(SOMETHING=arr_gate_time(dep_time))

But, the closest I can get is

df_test<-arr_gate_time(df1$time_hour) 
df_test_2<-cbind(df1,df_test)

I'm sure there's an easy implementation of dplyr to do this, but I can't figure out the right command structure.

Thanks!

Andrew Leach
  • 137
  • 1
  • 10

1 Answers1

3

tidyverse solution

library(tidyverse)
df_test_2 <- df1 %>% bind_cols(arr_gate_time(.$time_hour))

Base R (modifies original data frame)

df1[c('gate', 'check_in')] <- arr_gate_time(df1$time_hour)

data.table (modifies original data frame)

library(data.table)
setDT(df1)
df1[, c('gate', 'check_in') := arr_gate_time(time_hour)]
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Thanks, @Ryan. Do you know offhand if tidyverse vs Base R solution would be necessarily faster? I'll test both with my real code as I work it out unless it's obvious. – Andrew Leach Jul 02 '18 at 23:46
  • I believe the `tidyverse` solution makes a copy of `df1` even if you're assigning it back to `df1`, so the Base approach should be faster in this case. – IceCreamToucan Jul 02 '18 at 23:49
  • I'm not 100% sure. You could do some tests with the `microbenchmark` package. I would do a time comparison for this data and post it, but this data is to small for it to matter. – IceCreamToucan Jul 02 '18 at 23:51
  • 1
    The tidyverse version of `cbind` is `bind_cols` – alistaire Jul 02 '18 at 23:54