11

Lets grab the environments "namespace:stats" and "package:stats"

ns = getNamespace( "stats" )
pkg = as.environment( "package:stats" )

Now lets get the function "sd" in both:

nsSd = get( "sd" , envir = ns , inherits = FALSE )
pkgSd = get( "sd" , envir = pkg , inherits = FALSE )

Are they the same? They are! But what does "same" mean? Reference or value equality?

identical( nsSd , pkgSd )

This implies reference equality, since the following returns FALSE:

test1 = function() {}
test2 = function() {}
identical( test1 , test2 )

But if that's true, it means that an Environment's frame can contain function pointers alongside function objects. Further complicating the issue is fact that a function can "live" in one environment, but the function can be told that its executing environment is another environment. Chambers SoDA doesn't seem to have an answer (its a dense book, maybe I missed it!)

So, I'd like a definitive answer. Which of the following are correct? Or is there a false trichotomy here?

  1. nsSd and pkgSd are two different objects (albeit copies of each other), where the object in pkgSd has ns as its executing environment
  2. nsSd and pkgSd are pointers to the same object.
  3. nsSd is a pointer to pkgSd and as such they are treated as identical
Suraj
  • 35,905
  • 47
  • 139
  • 250
  • Might be worth taking a look at the c code for the internal `identical` function. http://svn.r-project.org/R/trunk/src/main/identical.c – Richie Cotton Feb 14 '12 at 14:56
  • It think you're making life a tad too complicated. We *know* there is only one `sd()` function, so any difference you see is due to your access path via, respectively, environemnt and namespace. – Dirk Eddelbuettel Feb 14 '12 at 15:18
  • According to the R internals manual the functions are of type `CLOSXP`. Matching pointers count as identical, otherwise it checks for identical `formals`, `body` and whatever `CLOENV(x)` is. – Richie Cotton Feb 14 '12 at 15:21
  • Your second example fails when checking the bodies. `identical(body(test1), body(test2))` is `FALSE`. – Richie Cotton Feb 14 '12 at 15:22
  • Dirk - could you elaborate? I don't know that there's only one sd(), that's why I posting =) It seems that you are differentiating between environment and namespace, do you mean package environment and namespace environment? I think you are saying the answer is either #2 or #3 above? – Suraj Feb 14 '12 at 17:07
  • Richie - thanks for checking the C code! That's strange that identical(body(test1)..) fails. Seems inconsistent? So maybe the answer here is #2? – Suraj Feb 14 '12 at 17:10

2 Answers2

5

They are pointers to the same object. Using this answer to another question, we can check if two objects refer to the same place in memory.

are_same <- function(x, y)
{
  f <- function(x) capture.output(.Internal(inspect(x)))
  all(f(x) == f(y))
}

are_same(nsSd, pkgSd) #TRUE
are_same(1:5, 1:5)    #FALSE
Community
  • 1
  • 1
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • I started a bounty which I'll award to you to express my thanks (when 24 wait period expires) – Suraj Feb 16 '12 at 15:12
  • Very generous of you. Glad I could be of use. – Richie Cotton Feb 16 '12 at 16:59
  • 3
    Very cool function. Until now, I had no idea that when I do `j <- rnorm(1e7); k <- j`, `j` and `k` are just pointers to the same memory location. But when I do `k[1] <- 1`, the whole vector `k` needs to get copied over to a new location, because it's now different. So `k <- j` turns out to be much much faster than `k[1] <- 1`. A nice service R core have provided us there, and I didn't even appreciate it until now! – Josh O'Brien Feb 17 '12 at 07:24
  • 2
    @JoshO'Brien:Yes, although R pretends to always pass things by value, in order to achieve better performance, it does secretly just pass references to objects when it can get away with it. – Richie Cotton Feb 17 '12 at 07:44
  • Is this implemented via promises in the same way lazy-evaluation of function arguments are implemented? – Suraj Feb 17 '12 at 13:51
  • @SFun28: I think it depends on the type of object. I recall that environments are always passed as references; other things ... well, you'll have to read R-internals to find out. R is cleverly designed so that you shouldn't ever need to worry about it though (unless you have aspirations to join R-core). http://cran.r-project.org/doc/manuals/R-ints.html – Richie Cotton Feb 17 '12 at 19:56
4

This isn't mostly an answer to your main question. On that issue, though, I agree with Dirk: there is just one sd() function, and it can be accessed, depending on the circumstances, by different scoping paths. For instance, when you type sd(x) at the command line, the function corresponding to the name sd will be found via its entry in the frame of the package:stats environment. When you type stats:::sd(x), or when another function in stats package calls sd(x), it will be found via a search in the namespace:stats environment.


Instead, I just wanted to make the point that your example using test1() and test2() doesn't really imply anything about the "reference equality" of objects that do evaluate to identical. To see the real reason those two are not identical, have a look at their structure as revealed by str():

test1 <- function() {}
test2 <- function() {}
identical( test1 , test2 )
# [1] FALSE

str(test1)
# function ()  
#  - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 13 1 25 13 25 1 1
#   .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x01613f54> 

str(test2)
# function ()  
#  - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 13 1 25 13 25 1 1
#   .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x01615730> 

If you scroll over to the right side of the code box above, you will see that the two functions differ in one of their attributes, namely the environment associated with their source files. (I don't know much about that attribute, but that's not really relevant here. The point is that they're not identical!)

If you tell R that you don't want to keep sourcefile attribute data with every function that's created, the 'unexpected' behavior of identical(test1, test2) goes away:

options(keep.source=FALSE)
test1 <- function() {}
test2 <- function() {}
identical( test1 , test2 )
# [1] TRUE
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • thanks, Josh! This was an insightful post. I feel Chambers calling re:sourcefile attribute =) – Suraj Feb 14 '12 at 19:06
  • @SFun28 - Good to see you back around here. Re: Chambers' SoDA, "dense" is a great description of the book; what makes it amazing to me is that there's not a wasted page in there. It really does repay any effort you put into reading it, which I appreciate. Cheers. – Josh O'Brien Feb 14 '12 at 19:18
  • Agreed...its chock full of good stuff. This post represents my "missing link" in a quest to write a blog article about how R searches and finds stuff. Its an accumulation of my findings from past posts, Chambers, and other sources and an attempt to do deeper than `search()`. I don't think any one source clearly explains the process with visuals that are easy to follow. I'd love to send you an advance copy for critique before I post it. – Suraj Feb 14 '12 at 19:39
  • Thanks, Josh! I've got your email copied down. Sorry for the late reply, I had to bolt for Valentine's day stuff =) I hope you can still delete the comment and sorry if not. I'll send you a draft probably in the next two or three weeks. – Suraj Feb 15 '12 at 14:23
  • @SFun28 Great. The comment's deleted, and I'll look forward to seeing the draft once you get a chance to work on it. – Josh O'Brien Feb 15 '12 at 16:36