3

By simply changing the argument order in the join step, I can get the code below to run. I just installed the most recent version of Tidyverse as of this post (1.3.1), and I'm using R version 4.1.1 (2021-08-10), "Kick Things". Please end my madness:

Updates:

  • If you run the the pipe without join statment, the assignment works fine (odd)
  • I had an old version of the tidyverse (which I foolishly did not record), and the code would run. Now it does not with the latest version of tidyverse. Not to complicate things too much, but I did this on a different machine with R version R version 3.6.3 (2020-02-29).
library(dplyr)

#Doesn't run
if(exists("test")) rm("test")
iris%>%
  assign(x = "test",value = .,envir = .GlobalEnv)%>%
  left_join(x = test,y =. ,by="Species")

#Runs
if(exists("test")) rm("test")
iris%>%
  assign(x = "test",value = .,envir = .GlobalEnv)%>%
  left_join(x = .,y =test ,by="Species")

Aegis
  • 145
  • 10
  • 2
    I hate that you're `assign`ing a value inside a pipeline and trying to use it later in the same pipeline--but I also find it very confusing that it works one way and not the other. Begrudging upvote. – Gregor Thomas Feb 02 '22 at 19:43
  • I'll definitely agree, I've never loved it. Seems so unnatural. I just prefer it to intermediate steps, as it prevents confusion if I am running blocks of code by themself (easy to forget to run intermediate steps). – Aegis Feb 02 '22 at 19:47
  • 1
    Not at all a duplicate but related reading: https://stackoverflow.com/q/40369832/903061 – Gregor Thomas Feb 02 '22 at 19:49

1 Answers1

5

The pipe makes things a little more confusing here, but we get the same effect if we write the same code as nested functions:

#Doesn't run
if(exists("test")) rm("test")
left_join(x = test, y = assign("test", iris, envir = .GlobalEnv), by = "Species")

#Runs
if(exists("test")) rm("test")
left_join(x = assign("test", iris, envir = .GlobalEnv), y = test, by = "Species")

When you see it written out like this, it now makes sense why the first version doesn't run: you are calling left_join on a non-existent object; since left_join is an S3 generic, it only evaluates x to determine method dispatch, and passes all the other parameters as unevaluated promises to left_join.data.frame. Since y has not been evaluated, test is not written, so we get a test not found error.

In the second version, the y parameter isn't evaluated until it is required inside left_join.data.frame, and by the time it is evaluated, test has already been written.

So this odd behaviour is a result of lazy evaluation.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • You are quite right, nicely put. Based on my earlier post, this must have been a change at some point. I think it should evaluate in the order of the pipe. That is the expectation of the user, but again, your logic is undeniable. I opened this as a bug on Cross Validated, but maybe will put it as a feature request. – Aegis Feb 02 '22 at 20:52
  • 1
    To make the evaluation eager, use magrittrs %!>% pipe. – Aegis Feb 03 '22 at 19:49
  • That works. Or as suggested in the [docs](https://magrittr.tidyverse.org/reference/pipe-eager.html) for the eager pipe (`%!>%`) you can just wrap `test` in `base::force()` and it will work with the standard pipe (`%>%`). – Dan Adams Feb 06 '22 at 19:30