14

When I don't use a pipe, I can change the original daframe using this command

df<-slice(df,-c(1:3))%>% # delete top 3 rows
df<-select(df,-c(Col1,Col50,Col51)) # delete specific columns

How would one do this with a pipe? I tried this but the slice and select functions don't change the original dataframe.

df%>%
  slice(-c(1:3))%>% 
  select(-c(Col1,Col50,Col51))

I'd like to change the original df.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Silver.Rainbow
  • 425
  • 4
  • 14

1 Answers1

21

You can definitely do the assignment by using an idiom such as df <- df %>% ... or df %>% ... -> df. But you could also avoid redundancy (i.e., stating df twice) by using the magrittr compound assignment operator %<>% at the beginning of the pipe.

From the magrittr vignette:

The compound assignment pipe operator %<>% can be used as the first pipe in a chain. The effect will be that the result of the pipeline is assigned to the left-hand side object, rather than returning the result as usual.

So with your code, we can do

library(magrittr)  ## came with your dplyr install
df %<>% slice(-(1:3)) %>% select(-c(Col1, Col50, Col51))

This pipes df into the expression and updates df as the result.

Update: In the comments you note an issue setting the column names. Fortunately magrittr has provided functions for setting attributes in a pipe. Try the following.

df %<>% 
    set_colnames(sprintf("Col%d", 1:ncol(.))) %>% 
    slice(-(1:3)) %>%
    select(-c(Col1,Col50,Col51))

Note that since we have a data frame, we can also use setNames() (stats) or set_names() (magrittr) in place of set_colnames().


Thanks to Steven Beaupre for adding the note from the vignette.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Can one have a general r statement in between? When I add a colnames function call in between the piped statements, I get an error `df%<>% colnames(df)<-vector_columnnames%>% slice(-c(1:3))%>% select(-c(Col1,Col50,Col51))` – Silver.Rainbow Oct 25 '15 at 22:37
  • I apologize for the follow up questions but since this thread probably will be useful to others, I'm putting it here. What if I want to make a vector assignment that is not part of the functions in the dplyr/magittr packages like this `df%<>% vector_columnnames<-sprintf("Col%d",1:length(df))%>% set_colnames(vector_columnnames) %>% slice(-c(1:3))%>% # delete top 3 rows select(-c(Col1,Col50,Col51))` . The vector_columnnames statement is not executed. Any suggestions on how one would essentially set a temporary vector while piping? – Silver.Rainbow Oct 25 '15 at 23:04
  • if `df` is a data frame wouldn't `setNames()` work as well as `set_colnames()` ? – Ben Bolker Oct 25 '15 at 23:09
  • Yes. setNames also works. The set_colnames is an magrittr 'extract'. – Silver.Rainbow Oct 25 '15 at 23:15
  • It's worth noting that non-mutability is a really valuable programming strategy. Basically, never over-write an object. That way, you can re-run and debug any part of your code in any order. So maybe name the first one df.original and the second df.final, and avoid %<>% altogether. – bramtayl Oct 26 '15 at 00:10