2

I'm posting this in hopes someone could explain the behavior here. And perhaps this may save others some time in tracking down how to fix a similar error.

The answer is likely somewhere here in this vignette by Hadley Wickham and Lionel Henry. Yet it will take someone like me weeks of study to connect the dots.

I am running a number of queries from a remote database and then combining them into a single data.table. I add the "part_" prefix to the name of each individual query result and use ls() and mget() with data.table's rbindlist() to combined them.

This works:

results_all <- rbindlist(mget(ls(pattern = "part_", )))

I learned that approach, probably from list data.tables in memory and combine by row (rbind), and it is a helpful thing to know how to do for sure.

For readability, I often prefer using the magrittr pipe (or chaining with data.table) and especially so with projects like this because I use dplyr to query the database. Yet this code results in an error:

results_all <- ls(pattern = "part_", ) %>% 
 mget() %>%
 rbindlist()

The error reads Error: value for ‘part_a’ not found where part_a is the first object name in the character vector returned by ls().

Searching that error message, I came across the discussion in this data.table Github issue. Reading through that, I tried setting "inherits = TRUE" within mget() like so:

results_all <- ls(pattern = "part_", ) %>% 
 mget(inherits = TRUE) %>%
 rbindlist()

And that works. So the error is happening when piping the result of ls() to mget(). And given that nesting ls() within mget() works, my guess is that it is something to do with the pipe and "the enclosing frames of the environment".

In writing this up, I came across Unexpected error message while joining data.table with rbindlist() using mget(). From the discussion there I found out that this also works.

results_all <- ls(pattern = "part_", ) %>% 
 mget(envir = .GlobalEnv) %>%
 rbindlist()

Again, I am hoping someone can explain what is going on for folks looking to learn more about how environments work in R.

Edit: Adding reproducible example

Per the request for a reproducible answer, running the code above using these three data.tables (data.frames or tibbles will behave the same) should do it.

part_a <- data.table(col1 = 1:10, col2 = sample(letters, 10))

part_b <- data.table(col1 = 11:20, col2 = sample(letters, 10))
  
part_c <- data.table(col1 = 21:30, col2 = sample(letters, 10)) 
Corey N.
  • 159
  • 11
  • Please create a reproducible example including all inputs and `library` statements. See information at top of [tag:r] tag page for posting instructions. – G. Grothendieck Oct 06 '20 at 18:41
  • Okay, I added one. It looks like @bcarlsen explained what's going on already. I would think you would see the same behavior when piping whichever object names returned by ls() to mget(). – Corey N. Oct 06 '20 at 19:14
  • In the paper you linked, what interests you is the part about lexical side effects. Reading the section about environments in Hadley 's Advanced R will also help. – moodymudskipper Oct 07 '20 at 02:11
  • 1
    Thank you @Moody_Mudskipper. Advanced R is very well explained and uses plain language. I appreciate you pointing me there (and I've learned more than a few things from your Twitter posts as well). I've learned a lot of tips from SO over the years. Thinking of the "building understanding" SO blog article I referenced in a comment under bcarlsen's answer, I'm wondering my why my question has been down-voted twice so far. Perhaps the issue is obvious to many people. Thinking of people transitioning to R from spreadsheets, I would highly doubt that. – Corey N. Oct 08 '20 at 00:29
  • Im happy I could help Corey. I don't think you deserve downvotes, and this issue is not obvious at all so don't feel bad. I believe you were downvoted because your example is not minimal, all the part about databases, rbindlist, data frames, is irrelevant to the issue. I understand you wanted to set the context but `x <- 1; y <-2; ls() %>% mget()` was enough here. – moodymudskipper Oct 08 '20 at 09:55

1 Answers1

1

The rhs argument to a pipe operator (in your example, the expression mget()) is never evaluated as a function call by the interpreter. The pipe operator is an infix function that performs non-standard evaluation of its second argument (rhs). The pipe function composes and performs a new function call using the RHS expression as a sort of "template".

The calling environment of this new function call is the function environment of %>%, not the calling environment of the lhs function or the global environment. .GlobalEnv and the calling environment of the lhs function happen to be the same environment in your example, and that environment is a parent to the function environment of %>%, which is why inherits = TRUE or setting the environment to .GlobalEnv works for you.

bcarlsen
  • 1,381
  • 1
  • 5
  • 11
  • Thank you @bcarlsen. That gives some clues for future study. I guess I’m not understanding why this doesn't happen more. Is it because most functions inherit the global environment and mget()’s default does not? The Overview on https://magrittr.tidyverse.org says that `x %>% f is equivalent to f(x)`. Then it says that's "not technically exact" because of the non-standard evaluation. Yet it also says it "has no practical implication” in many cases. I don't see why evaluating `ls()` first causes an issue here. I have used `%>%` for awhile and haven't come across this before. – Corey N. Oct 06 '20 at 20:31
  • In R, the search path within a function environment includes all parent environments. `mget()` is unusual (but hardly unique) in that it is specifically designed not to use default search path, and defaults to looking only in the namespace of a specific environment. Using `get` and `mget()` is a fairly esoteric design choice. The issue is not the order in which stuff gets evaluated. The issue is that the pipe operator is a function. You use `%>%` to compose a function call to `mget()` within its function environment. The default behavior of `mget()` is to only look in its calling environment. – bcarlsen Oct 06 '20 at 21:44
  • The search path is used when evaluating a variable, so tricky functions would be the ones using NSE and evaluating in unexpected places. mget takes variable names, not variables, and it has an environment attribute, so it's fairly unambiguous. However what is ambiguous is where the pipe evaluates its steps. mget by default will look for variables in its parent frame, and this is not where you ran ls(), though it could very well have been with a different design (hence the paper you linked). – moodymudskipper Oct 07 '20 at 02:17
  • You could also try `"part_" %>% ls(pattern =.)` which would show you another example of this counter intuitive behavior. – moodymudskipper Oct 07 '20 at 02:19
  • 1
    Thank you both. The difference between varibles/objects and their names is a tough one to grasp. Coincidentally, SO is showing[this blog article about building understanding vs learning fast ](https://stackoverflow.blog/2020/10/05/play-the-long-game-when-learning-to-code/?cb=1) alongside of this post for me. It really helps when the R community comes out to help non-programmer types like me build understanding. So thank you. – Corey N. Oct 08 '20 at 00:16