7

I recently discovered the pipe operator %>%, which can make code more readable. Here is my MWE.

library(dplyr)                                          # for the pipe operator
library(lsr)                                            # for the cohensD function

set.seed(4)                                             # make it reproducible
dat <- data.frame(                                      # create data frame
    subj = c(1:6),
    pre  = sample(1:6, replace = TRUE),
    post = sample(1:6, replace = TRUE)
)

dat %>% select(pre, post) %>% sapply(., mean)           # works as expected

However, I struggle using the pipe operator in this particular case

dat %>% select(pre, post) %>% cohensD(.$pre, .$post)    # piping returns an error
cohensD(dat$pre, dat$post)                              # classical way works fine

Why is it not possible to subset columns using the placeholder .in combination with $? Is it worthwhile to write this line using a pipe operator %>%, or does it complicate syntax? The classical way of writing this seems more concise.

Community
  • 1
  • 1
piptoma
  • 754
  • 1
  • 8
  • 19
  • You probably get an error because the `%>%` pipe operator pipes the left-hand-side as the first argument of the right-hand-side. But it seems that the `cohensD` function doesn't have a first argument that accepts a data.frame. IMO it's cleaner to write this in base R syntax – talat Jul 20 '16 at 07:37
  • 2
    This would work: `dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}`. It makes the last call be treated like an expression and not a function. When you pipe something into an expression, the `.` gets replaced as expected. I often use this trick to call a function which does not interface well with piping. – asachet Jul 20 '16 at 07:47

3 Answers3

12

This would work:

dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}

Wrapping the last call into curly braces makes it be treated like an expression and not a function call. When you pipe something into an expression, the . gets replaced as expected. I often use this trick to call a function which does not interface well with piping.

What is inside the braces happens to be a function call but could really be any expression of . .

asachet
  • 6,620
  • 2
  • 30
  • 74
6

Since you're going from a bunch of data into one (row of) value(s), you're summarizing. in a dplyr pipeline you can then use the summarize function, within the summarize function you don't need to subset and can just call pre and post

Like so:

dat %>% select(pre, post) %>% summarize(CD = cohensD(pre, post)) 

(The select statement isn't actually necessary in this case, but I left it in to show how this works in a pipeline)

Marijn Stevering
  • 1,204
  • 10
  • 24
  • 2
    I think you dont need to explicitly subtype with `$`; this should be enough `dat %>% summarize(CD = cohensD(pre, post))` – Drey Jul 20 '16 at 08:02
  • You're totally right, I remembered I had used subsetting inside a pipeline before and was so focused on making the subset work, I didn't think of just not subsetting in the first place, thanks! – Marijn Stevering Jul 20 '16 at 08:05
  • Why is it not working with the `cohen.d` function of the `library(effsize)` package? `dat %>% summarize(CD = cohen.d(pre, post))` returns an error. – piptoma Jul 20 '16 at 09:02
  • 1
    the `cohen.d` function returns a list with additional information (Conf ints, etc.) summarize expects just 1 number. you can make it work by getting just the estimate form the list : `summarize(CD = cohen.d(pre, post)$estimate)` – Marijn Stevering Jul 20 '16 at 09:08
  • Works perfectly, thanks. As I understand your suggested answer, the `select(pre, post)` part is not needed (cf. Drey's answer). Do you consider it necessary or could you delete it? – piptoma Jul 20 '16 at 09:13
  • 1
    It isn't necessary. but if there's no previous steps, the whole pipeline is unnecessary and you can just call `summarize(dat, CD = cohensD(pre, post))`. – Marijn Stevering Jul 20 '16 at 09:26
1

It doesn't work because the . operator has to be used directly as an argument, and not inside a nested function (like $...) in your call.

If you really want to use piping, you can do it with the formula interface, but with a little reshaping before (melt is from reshape2 package):

dat %>% select(pre, post) %>% melt %>% cohensD(value~variable, .)
#### [1] 0.8115027
agenis
  • 8,069
  • 5
  • 53
  • 102