4

I'd like to use $ at the end of a magrittr/tidyverse pipeline. $ works directly next to tidyverse functions like read_csv and filter, but as soon I create a pipeline with %>% it raises an error. Here is a simple reproducible example.

# Load libraries and create a dummy data file
library(dplyr)
library(readr)
write_csv(data_frame(x=c(0,1), y=c(0,2)), 'tmp.csv')

# This works
y <- read_csv('tmp.csv')$y
str(y)

# This also works
df_y <- read_csv('tmp.csv')
y <- filter(df_y, y > 0)$y
str(y)

# This does not work
y <- read_csv('tmp.csv') %>% filter(y > 0)$y

My questions are:

1) What are the underlying explanations/mechanics for why using $ at the end of a pipepline does not work?

2) What's a best practice way for what I am trying to accomplish? Specifically, to get a vector as the end result of a pipeline?

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
  • 1
    There's some overlap w/ https://stackoverflow.com/questions/21618423. It doesn't address your question 1. But it provides alternatives to the [`dplyr::pull()`](https://github.com/tidyverse/dplyr/issues/2054) I advocated below. – wibeasley Jan 06 '18 at 19:19

2 Answers2

10

It does not work because it thinks that the function is $, not filter, and tries to run:

"$"(., filter(y > 0), y)

which, of course, makes no sense.

Suppose DF is as shown below. Then any of the subsequent lines of code work as expected:

DF <- data.frame(y = seq(-3, 3))

DF %>% filter(y > 0) %>% "$"(y)
## [1] 1 2 3

DF %>% { filter(., y > 0)$y }
## [1] 1 2 3

DF %>% filter(y > 0) %>% "[["("y")
## [1] 1 2 3

library(magrittr) # supplies extract2 as an alias for [[
DF %>% filter(y > 0) %>% extract2("y")
## [1] 1 2 3
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

question 1: I think the problem is grouping. Enclose most of that statement in parentheses, and it produce the same result as your first two approaches:

y <- (read_csv('tmp.csv') %>% filter(y > 0))$y

question 2: the newish function dplyr::pull() is my preference for pulling out a single vector, instead of returning an entire data.frame.

read_csv('tmp.csv') %>% 
  filter(y > 0) %>% 
  dplyr::pull(y)

The older way was to treat the data.frame as a list, and pull out a single element. The dot on the last line is magrittr syntax for the output of a pipe.

read_csv('tmp.csv') %>% 
  filter(y > 0) %>% 
  .[["y"]]
wibeasley
  • 5,000
  • 3
  • 34
  • 62