2

Recently I learned that in R there are no references, rather all object are immutable and each assignment makes a copy.

Uh-oh.

Copying large matrices over and over seems pretty horrible...

Now I'm in a paranoia, copypasting code all the time because I'm afraid of making helper functions (passing parameters = assignment? returning values = assignment?), I'm afraid of making helper variables if I'm not 100% sure an object would be copied anyway...

Example:

What I would love to make:

foo = function(someGivenLargeObject) {
    returnedMatrix = someGivenLargeObject$someLargeMatrix # <- BAD?!?!?!?!
    if(someCondition)
        returnedMatrix = operateOn(returnedMatrix)
    if(otherCondition)
        returnedMatrix = operateOn(returnedMatrix)
    returnedMatrix
}

What I'm making instead:

foo = function(someGivenLargeObject) { # <- still BAD?!?!?!
    returnedMatrix = NULL # <- No copy of someLargeMatrix is made!
    if(someCondition)
        returnedMatrix = operateOn(someGivenLargeObject$someLargeMatrix)
    if(otherCondition)
        returnedMatrix = operateOn(
            if(is.null(returnedMatrix)) 
                someGivenLargeObject$someLargeMatrix
            else
                returnedMatrix
        ) # <- ^ Incredible clutter! Unreadable!
    if(is.null(returnedMatrix))
        return(someGivenLargeObject$someLargeMatrix)
    else
        return(returnedMatrix) # <- does return copy stuff?!?!?!?!

The readability loss in the second version of the function is pretty amazing IMO; yet - is this the price to avoid the unecessary copying of someLargeMatrix in case neither someCondition nor otherCondition holds? Because the line returnedMatrix = someGivenLargeObject$someLargeMatrix would necessite this copying?

Or am I in a paranoia, may I go safely with the more readable version of the function because making a reference to someLargeMatrix doesn't necessite copying? (BUT THERE ARE NO REFERENCES IN R!!!)

Also I hope that a function call / function return doesn't copy stuff either? }

Side note: Just so that it is clear: I didn't yet run into an issue when I knew an object was copied unecessarily in a situation like that I described above. I'm just perplexed by having read that "there are no references in R", so this question is based on my worries from what might be the implication of this lack of references, rather than any empirical observation.

  • https://www.stat.berkeley.edu/~paciorek/computingTips/Pointers_passing_reference_.html – Bulat Apr 15 '19 at 21:53
  • 2
    Just so you know, it is false that R “always copies on assignment”. R implements a version of reference counting that allows it to avoid copying objects as much as possible. Of course, it will never be as copy free as a language with pointers and such. Also, given the flexibility of the R language it can be sometimes suprising when R is forced to copy bc it cannot adequately predict what an arbitrary function might do to an object. – joran Apr 15 '19 at 21:54
  • https://stackoverflow.com/questions/33186633/r-pass-data-frame-by-reference-to-a-function if you can use data.table – Bulat Apr 15 '19 at 21:56
  • You might find this interesting to read: https://adv-r.hadley.nz/names-values.html#copy-on-modify. In other words, I think in your case it is fine to use the first version, since you do not modify unless the condition applies you do not copy unless the condition applies. – Calum You Apr 16 '19 at 06:14

1 Answers1

-1

Donald Knuth famously said "Premature Optimization is the root of all evil",

http://wiki.c2.com/?PrematureOptimization

it is good to be aware about this, but code clarity is on most cases more important. R is usually smart enough to figure out when copy is needed. (not all assignments cause a copy only assignments that are later modified)

Carlos Santillan
  • 1,077
  • 7
  • 8