R: Further subset a selection using the pipe %>% and placeholder

Question

I recently discovered the pipe operator %>%, which can make code more readable. Here is my MWE.

library(dplyr)                                          # for the pipe operator
library(lsr)                                            # for the cohensD function

set.seed(4)                                             # make it reproducible
dat <- data.frame(                                      # create data frame
    subj = c(1:6),
    pre  = sample(1:6, replace = TRUE),
    post = sample(1:6, replace = TRUE)
)

dat %>% select(pre, post) %>% sapply(., mean)           # works as expected

However, I struggle using the pipe operator in this particular case

dat %>% select(pre, post) %>% cohensD(.$pre, .$post)    # piping returns an error
cohensD(dat$pre, dat$post)                              # classical way works fine

Why is it not possible to subset columns using the placeholder .in combination with $? Is it worthwhile to write this line using a pipe operator %>%, or does it complicate syntax? The classical way of writing this seems more concise.

You probably get an error because the `%>%` pipe operator pipes the left-hand-side as the first argument of the right-hand-side. But it seems that the `cohensD` function doesn't have a first argument that accepts a data.frame. IMO it's cleaner to write this in base R syntax — talat, Jul 20 '16 at 07:37
This would work: `dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}`. It makes the last call be treated like an expression and not a function. When you pipe something into an expression, the `.` gets replaced as expected. I often use this trick to call a function which does not interface well with piping. — asachet, Jul 20 '16 at 07:47

score 12 · Accepted Answer · answered Jul 20 '16 at 07:53

This would work:

dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}

Wrapping the last call into curly braces makes it be treated like an expression and not a function call. When you pipe something into an expression, the . gets replaced as expected. I often use this trick to call a function which does not interface well with piping.

What is inside the braces happens to be a function call but could really be any expression of . .

Marijn Stevering · Answer 2 · 2016-07-20T09:31:19.170

6

Since you're going from a bunch of data into one (row of) value(s), you're summarizing. in a dplyr pipeline you can then use the summarize function, within the summarize function you don't need to subset and can just call pre and post

Like so:

dat %>% select(pre, post) %>% summarize(CD = cohensD(pre, post))

(The select statement isn't actually necessary in this case, but I left it in to show how this works in a pipeline)

edited Jul 20 '16 at 09:31

answered Jul 20 '16 at 07:56

Marijn Stevering

1,204
10
24

2

I think you dont need to explicitly subtype with `$`; this should be enough `dat %>% summarize(CD = cohensD(pre, post))` – Drey Jul 20 '16 at 08:02
You're totally right, I remembered I had used subsetting inside a pipeline before and was so focused on making the subset work, I didn't think of just not subsetting in the first place, thanks! – Marijn Stevering Jul 20 '16 at 08:05
Why is it not working with the `cohen.d` function of the `library(effsize)` package? `dat %>% summarize(CD = cohen.d(pre, post))` returns an error. – piptoma Jul 20 '16 at 09:02
1

the `cohen.d` function returns a list with additional information (Conf ints, etc.) summarize expects just 1 number. you can make it work by getting just the estimate form the list : `summarize(CD = cohen.d(pre, post)$estimate)` – Marijn Stevering Jul 20 '16 at 09:08
Works perfectly, thanks. As I understand your suggested answer, the `select(pre, post)` part is not needed (cf. Drey's answer). Do you consider it necessary or could you delete it? – piptoma Jul 20 '16 at 09:13
1

It isn't necessary. but if there's no previous steps, the whole pipeline is unnecessary and you can just call `summarize(dat, CD = cohensD(pre, post))`. – Marijn Stevering Jul 20 '16 at 09:26

score 1 · Answer 3 · answered Jul 20 '16 at 07:43

It doesn't work because the . operator has to be used directly as an argument, and not inside a nested function (like $...) in your call.

If you really want to use piping, you can do it with the formula interface, but with a little reshaping before (melt is from reshape2 package):

dat %>% select(pre, post) %>% melt %>% cohensD(value~variable, .)
#### [1] 0.8115027

R: Further subset a selection using the pipe %>% and placeholder

3 Answers3

Linked