Hadley's new pryr package that shows the address of a variable is really great for profiling. I have found that whenever a variable is passed into a function, no matter what that function does, a copy of that variable is created. Furthermore, if the body of the function passes the variable into another function, another copy is generated. Here is a clear example
n = 100000
p = 100
bar = function(X) {
print(pryr::address(X))
}
foo = function(X) {
print(pryr::address(X))
bar(X)
}
X = matrix(rnorm(n*p), n, p)
print(pryr::address(X))
foo(X)
Which generates
> X = matrix(rnorm(n*p), n, p)
> print(pryr::address(X))
[1] "0x7f5f6ce0f010"
> foo(X)
[1] "0x92f6d70"
[1] "0x92f3650"
The address changes each time, despite the functions not doing anything. I'm confused by this behavior because I've heard R described as copy on write - so variables can be passed around but copies are only generated when a function wants to write into that variable. What is going on in these function calls?
For best R development is it better to not write multiple small functions, rather keep the content all in one function? I have also found some discussion on Reference Classes, but I see very little R developers using this. Is there another efficient way to pass the variable that I am missing?