0

After a long history with C and C++, I've only relatively recently become a programming-language polyglot started working in earnest with newer languages, specifically Groovy, and to a far lesser degree, Python.

RE: "pass-by-reference languages", I'd like to understand the rule (or rule-of-thumb) which variables are passed by function and which are passed by value ^.

I've written two equivalent programs in Jenkins/Groovy and Python. The programs demonstrate that an integer is passed by value, but a dictionary is passed by reference:

def dfunc(v) {
  v.foo = "dolor"
  v = [a:"a", b:"b"]
}

def podfunc(v) { v = 42 }

pipeline {
  agent any

  stages {
    stage( "1" ) {
      steps {
        script {
          def my_dict = [foo: "lorem", bar: "ipsum"]
          def my_int = 5

          echo my_dict.toString()
          dfunc(my_dict)
          echo my_dict.toString()

          echo my_int.toString()
          podfunc(my_int)
          echo my_int.toString()
        }
      }
    }
  }
}
def dfunc(d):
    d["foo"] = "dolor"
    d = { "a": "a", "b": "b" }
    
def podfunc(d):
    d = 42

my_dict = { "foo": "lorem", "bar": "ipsum" }
my_int = 5

print(my_dict)
dfunc(my_dict)
print(my_dict)

print(my_int)
podfunc(my_int)
print(my_int)

The output of both the above are effectively:

{'foo': 'lorem', 'bar': 'ipsum'}
{'foo': 'dolor', 'bar': 'ipsum'}
5
5

(with language-specific variation on bracing, etc.)

I think that another way of looking at this -- but I'm not sure if this is correct -- is perhaps that the languages aren't so much "pass-by-reference" as they are "pass-by-value, but certain variables are implicitly created on the heap".

In other words, I'm inferring that in Groovy and Python, when I call either dfunc(d) or podfunc(d), in both cases, I am technically passing by value; it's just "primitives" like the implicit integer my_int are created on the stack, and "other variables", like the implicit dictionary my_dict are created on the heap.

Is this a correct interpretation of why some variables appear to be passed by value and others passed by reference?

If so, which variables are implicitly created on the stack/heap, and why? And is this coder-controllable?
To my poor C-biased eyes, there's nothing about these two variables' declarations that self-evidences which is created on the stack and which is created on the heap:

def my_dict = [foo: "lorem", bar: "ipsum"]
def my_int = 5

The closest existing discussion I was able to find were these:
Groovy (or Java) - Pass by reference into a wrapper object
What's the difference between passing by reference vs. passing by value?
...but either I'm not fully appreciating the existing answers, or they're not quite addressing my question as stated above.

Thanks for helping me grok this.


^ @juanpa.arrivillaga corrected me: the programs do not demonstrate that anything is passed by reference - this question, inevitably, becomes about the misapplication of the phrase "pass by reference".

StoneThrow
  • 5,314
  • 4
  • 44
  • 86
  • My understanding is that Python *always* passes by reference. What's copied over is the pointer to that reference. So the `d` inside `podfunc` is not the same `d` as the parent scope (they point to the same place, but are actually different pointers). When you re-assign it (`d = 42`), you're actually changing the reference of the local copy of the pointer, not the original one – Matias Cicero Sep 24 '21 at 23:47
  • All arguments are references in Python, as all names are references. The difference in your example comes from the fact that you **modify** the dictionary and then **replace** the integer. – Klaus D. Sep 24 '21 at 23:53
  • Python **never** passes by reference. Python's evaluation strategy is *neither call by reference nor call by value*. It is "call by object sharing", although that name is a bit arcane, and various other terms are used. – juanpa.arrivillaga Sep 25 '21 at 00:05
  • See [this answer to a related question](https://stackoverflow.com/a/986145/5014455) for a good explanation of Python's evaluation strategy – juanpa.arrivillaga Sep 25 '21 at 00:10
  • 2
    Note your example *does not demonstrate that dictionaries are passed by reference*. To demonstrate pass by reference, *assignment to the parameter would be visible in the caller*. That is *never the case in Python* – juanpa.arrivillaga Sep 25 '21 at 00:12
  • 1
    @juanpa.arrivillaga passing "by reference" in C is just a commonly recognized convention whereby you pass the pointer to something rather than the value of something. In that respect it behaves exactly like Python: you may update the value pointed to by the pointer but updating the pointer itself has no effect outside the function, since the pointer was passed by value. – Matias Cicero Sep 25 '21 at 01:09
  • 1
    @MatiasCicero **that isn't pass by reference** at all. That is a *fundamental misunderstanding*. It is simply a fact that C, very famously, only supports call by value. Passing a pointer by value is *not* pass by reference. And Python uses *neither evaluation strategy*. These are just basic issues of terminology. The fact that people frequently use the terminology incorrectly is no excuse to perpetuate that. – juanpa.arrivillaga Sep 25 '21 at 01:16
  • @juanpa.arrivillaga - If I may validate my attempt at understanding using `C` terminology: in `Python`/`Groovy`, _everything_ is assigned on the heap, variables are always implicitly pointers, "using" a variable always implicitly dereferences the pointer, _and_ there is no explicit dereference operator, and function args are always passed by value. I hope that's correct - because I _think_ that explanation fits the pattern of my code, and aligns with the answer in that link you posted, which itself links to a Python FAQ (?) – StoneThrow Sep 25 '21 at 11:53
  • @juanpa.arrivillaga - if I may ask to clarify on one more point: am I correct to say "Python is not pass by reference"? I think you said that explicitly in your earlier comment, but I wanted to be doubly-certain because _the opposite_ has been stated so often and so widely that that notion had become cemented in my mind. Therefore, the `C++` function `void foo(int& i)` represents what can more properly be called "pass by reference" -- true? – StoneThrow Sep 25 '21 at 12:26
  • 1
    @StoneThrow yes, exactly. You can define a `foo(int& i)` and that does `i = 42`, and in some other function, do `foo(x)` and now *in the caller's scope* `x` will be `42`, this is not possible in python. Essentially, in call by reference, the parameter will act as an *alias to the variable in the caller's scope*. Also, since the value is not copied in Python, it isn't call by value either – juanpa.arrivillaga Sep 25 '21 at 21:07
  • 1
    @StoneThrow yes, you can think of variables in Python as acting like pointers that automatically get dereferenced. IOW python has "reference semantics". But note it isn't accurate to say Python is *call by value* either. The academic term is "call by sharing" coined by Barbara Liskov (famous for the Liskov Substitution Principle). But that term isn't commonly used. Here is a really good article that disambiguated these three evaluation strategies: https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/ – juanpa.arrivillaga Sep 25 '21 at 21:11
  • 1
    Also, yes in CPython all objects are allocated on a privately managed heap, although, this is an *implementation detail*. – juanpa.arrivillaga Sep 25 '21 at 21:15
  • @juanpa.arrivillaga: "But note it isn't accurate to say Python is call by value either." The passing semantics of Python are exactly identical to Java, and Java is universally described, both on this site and elsewhere, as pass-by-value only. For consistency, Python should also be described as pass-by-value only. – newacct Sep 26 '21 at 21:53
  • @newacct No. The Java community is not one to emulate here. The fundamental problem is that Python is a purely object oriented language, whereas Java isn't, there are "reference types and primitive types". So to square that circle, it is often described as "call by value but references types pass references by value". It confuses the *semantics* with the implementation. Consistency with something silly isn't a virtue. See [this section on call by sharing](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_sharing) – juanpa.arrivillaga Sep 26 '21 at 22:39
  • The fundamental problem is that there is a false dichotomy: languages either use call by value or they use call by reference. If they don't use one, that implies they must use the other. This is false. There are [many different evaluation strategies](https://en.wikipedia.org/wiki/Evaluation_strategy). Indeed, most modern languages use *neither* call by value nor call by reference. But a combination of this false dichotomy and, I think, poor translations of course concepts in introductory programming classes from C/C++ to Java to eventually Python, have led to where we are today – juanpa.arrivillaga Sep 26 '21 at 22:55
  • @juanpa.arrivillaga: Value types can be considered equivalent to reference types to immutable objects. In any case, whether something is a reference type is subjective and is irrelevant. Passing semantics should be defined clearly based on semantics of the language operations alone. There doesn't have to be "many different evaluation strategies" if they are clearly defined. You may disagree because you have a different notion of what these terms mean. But do you have a clear, objective definition that is based only on the semantics? – newacct Sep 27 '21 at 07:39
  • @juanpa.arrivillaga: Here is an objective, semantic definition: It is pass-by-value if assigning to a parameter inside a function has no effect on a passed variable in the calling scope. It is pass-by-reference if assigning to a parameter inside a function has the same effect as assigning to a passed variable in the calling scope. Under this definition, C, Java, Python, Ruby, and Scheme are all solely pass-by-value. Under this definition, C++, PHP, C#, and Swift are by default pass-by-value, but parameters can be explicitly marked to be pass-by-reference (and this is true regardless of type). – newacct Sep 27 '21 at 07:40
  • @juanpa.arrivillaga: (And of course this definition doesn't apply for languages that don't allow assignment to local variables, like ML and Haskell; or you can consider there to be no distinction between pass-by-value or pass-by-reference for those languages.) I don't claim that every language fits into this definition, but (other than the languages that don't allow assignment to variables) as you can see from my list above, most modern languages do in fact fit well into this definition. – newacct Sep 27 '21 at 07:42
  • @newacct It is a false dichotomy. In call by name, for example, assignments won't affect variables in the callers scope, but it is *neither call by reference nor call by value*. Call by reference and call by value are terms that were used to describe the evaluation semantics in languages like C and Fortran, where variables *are storage areas*, i.e., some piece of memory. In Python, variables *are not storage areas*, they are names in namespaces (literally entries in a `dict` object in global scopes). Importantly, in call by value, whatever object you pass, *the object is copied*. – juanpa.arrivillaga Sep 28 '21 at 18:18
  • @newacct anyway, I'd love to continue this discussion somewhere other than the comment section here, which really isn't meant for this – juanpa.arrivillaga Sep 28 '21 at 18:18
  • @juanpa.arrivillaga: "Importantly, in call by value, whatever object you pass, the object is copied." But who says the value you are passing is an object? Consider the semantics of assignment. If objects were the values in the language, then two variables would hold two separate objects, and assignment would copy the state of one into the other, but they would still be two separate objects. But this is not what assignment does in Python; you get two variables that point to the same shared object. – newacct Oct 03 '21 at 05:50
  • @juanpa.arrivillaga: This is exactly what you would expect if the "value" that variables hold is a "pointer/reference to an object" or "information that tells you which object is pointed to". Or in C++ syntax, it's as if every value in the language has type `SomeClass *`. If a C++ function had parameter `SomeClass *` (no `&`), a C++ purist would say that is pass-by-value, because the value passed is of a pointer type, and the pointer is copied. The object that is pointed to is not copied, but that is not relevant, because an object is not what is passed. – newacct Oct 03 '21 at 05:53

1 Answers1

1

This answer focuses on Groovy/JVM, but I believe that conceptually it works similarly in Python and any other higher level language that does not expose pointers directly.

Groovy and Python (to my knowledge) do not have pointers a programmer can use within the language. However, variables of non-primitive types (at least in Groovy/Java), or Objects in other words, are passed "by reference" in a sense... except that you cannot access the pointer directly, so it's not possible to modify the pointer directly, you can only modify the data the Object itself points to by re-assigning them.

Objects are generally allocated on the heap, but JVM optimisations may actually skip allocation in what's called escape analysis - i.e. if it can prove the pointer is only needed in the local function, it may put the memory directly on the stack (TBH I am not sure when that's done and whether it's common). This shouldn't matter for the semantics of the program, though, because the difference between stack and heap allocation is not exposed to JVM programs.

Primitives, as you found out, are passed by value... immutable objects are also "semantically" passed by value (in fact, the JVM will add value types soon when Project Valhalla is ready), despite the fact that in reality they're just pointers: that's because you cannot reassign the pointer you're given, and the object being immutable means you cannot reassign any pointers it may have itself either: effectively, it's a value.

But when you pass a mutable object to a method, then you're allowed to change "pointers" inside the object (its fields), but not the data pointed to by the Object reference itself - because that's not exposed or accessible in Java/Groovy... so it's a bit similar to when you pass something by reference in languages that have pointers, but not the same.

A few examples:

import groovy.transform.Immutable 
import groovy.transform.Canonical

@Immutable
class Person {
    String name
}

@Canonical
class MutablePerson {
    String name
}

def takesPerson(Person p) {
    // nothing we can change on p as it's immutable,
    // effectively `p` is a "value", not reference.
    println p

    // we are allowed to re-assign p locally, but the
    // caller's reference is not affected. 
    p = null
}

def takesMPerson(MutablePerson p) {
    // we're allowed to change the internal structure of p
    p.name = 'Eva'

    // again, re-assigning p here won't change the caller's ref
    p = null
}

person = new Person(name: 'Joe')
mPerson = new MutablePerson(name: 'Mary')

println "Person before call: $person"
println "MutablePerson before call: $mPerson"

takesPerson person
takesMPerson mPerson

println "Person after call: $person"
println "MutablePerson after call: $mPerson"

Result:

Person before call: Person(Joe)
MutablePerson before call: MutablePerson(Mary)
Person(Joe)
Person after call: Person(Joe)
MutablePerson after call: MutablePerson(Eva)

To my poor C-biased eyes, there's nothing about these two variables' declarations that self-evidences which is created on the stack and which is created on the heap:

def my_dict = [foo: "lorem", bar: "ipsum"]
def my_int = 5

In Java, you know when something is probably going to go on the heap because you use the new keyword to create it (as opposed to literals which never need to go on the heap, and String which can be created with double-quotes in which case they MAY or MAY NOT allocate memory depending on whether it's a constant - static final without initialization logic - or it has been interned). But in Groovy, that's not the only case where data is likely being allocated on the heap because Groovy has more literals than Java, e.g. Lists, Sets and Maps, that also go on the heap (with caveats, as I mentioned earlier).

But if you just know that behind the scenes, Groovy collection literals are just calling Java's new CollectionType() and then adding the values you give into it, then you can have a reasonable guess at what is going to be allocated on the heap and what's not.

The more relevant question to the Groovy programmer is what data they can change and what's the effect of re-assigning something.

Hopefully, it's clear with the above examples what you can change: non-final local variables and class fields. And because method arguments are just local variables in JVM methods, re-assigning them has only effect in the scope of the method body. Re-assigning class fields has effect while the class instance is "alive" and until it's re-assigned again.

Data structures may seem to make this more complicated, but it's just the same thing as they are all implemented using classes and fields. When you change a Groovy Map (or dictionary), it's just re-assigning a field somewhere.

You can see that by implementing a little data structure, say an extremely simplified linked-list:

import groovy.transform.Canonical

@Canonical
class MyLinkedList {
  def value
  def next
}

def singleItemList = new MyLinkedList(value: 'single item')

def list = new MyLinkedList(value: 'first item',
                            next: singleItemList)

println singleItemList
println list

// modify list
list.next = new MyLinkedList(value: 'new item')

println list

// same kind of thing happens when you modify a Map
def map = [foo: 'lorem', bar: 'ipsum']

println map

map.foo = 'maximum'

println map

Result:

MyLinkedList(single item, null)
MyLinkedList(first item, MyLinkedList(single item, null))
MyLinkedList(first item, MyLinkedList(new item, null))
[foo:lorem, bar:ipsum]
[foo:maximum, bar:ipsum]
Renato
  • 12,940
  • 3
  • 54
  • 85
  • "But when you pass a mutable object to a method, then you're allowed to change "pointers" inside the object (its fields), but not the data pointed to by the Object reference itself - because that's not exposed or accessible in Java/Groovy... so it's a bit similar to when you pass something by reference in languages that have pointers, but not the same." No, that isn't like pass by reference *at all*. – juanpa.arrivillaga Sep 28 '21 at 18:15
  • Here is the key feature of pass by reference, `x = somethign; def foo(&var): var = something_else; foo(x); print(x)` will print something_else. i.e. *assignment to the parameter passed by reference is *seen in the caller scope* for the variable provided as an argument*. You are just describing passing pointers. This is possibe in C, for example, but *C is definitely only call by value* – juanpa.arrivillaga Sep 28 '21 at 18:21
  • You have a definition of `pass by reference` that I tried to make clear is not possible in the JVM. But because any non-primitive Object in the JVM is always a pointer to the actual data, it's definitely a "reference". Because when you call a method and pass in an Object as an argument, the method gets something that points to the same data as the caller, it can be said that what you passed in is a reference. Pass by value would mean that changes to the "value" wouldn't reflect the data of the caller, but it does. Therefore, I say pass-by-reference is closer to the truth than pass-by-value. – Renato Sep 29 '21 at 11:30
  • Let's save future readers of this useless debate: https://stackoverflow.com/questions/373419/whats-the-difference-between-passing-by-reference-vs-passing-by-value Java and C are pass-by-value only if you use the old, strict definition of the phrase, ignoring what a modern programmer might infer from the phrase and the meaning of "value" and "reference" instead. – Renato Sep 29 '21 at 11:56
  • It's not a debate. Words have meanings. It's not like C++ and C#, both languages that support call by reference, aren't in heavy use *today*. Heck, even *Fortran* is still used today. Just because programmers have used the terminology incorrectly doesn't make it a "debate". And Python **is not** pass by value either. – juanpa.arrivillaga Sep 29 '21 at 15:01
  • @juanpa.arrivillaga https://en.wikipedia.org/wiki/Reference_(computer_science) – Renato Sep 30 '21 at 19:03
  • Again, I'm not sure what point you think you are making. Yes, that is a link to a description of what "a reference" means in computer science. [Here is a link to what the term "call by reference" means](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_reference) in computer science. You'll notice that it *doesn't* mean what you are using it to mean. Again, *there is no controversy here*. Call by reference is a well-defined, and well understood evaluation strategy. It isn't "the old definition", *modern programmers use it today*. But not in Python, because Python doesn't support it – juanpa.arrivillaga Sep 30 '21 at 19:37
  • Here's the classic litmus test for whether you support call be reference: write a `swap(a, b)` function, taht *swaps the contents of the variables in the caller*. So suppose you are in another function, `def foo()...` and there is `x = 1` and `y=2`. Can you write a function that will swap these variables if you call it like `x = 1; y = 2; swap(x, y); print(x, y)` will print `2 1`? You can't do that in Python. You could if Python were call by reference, trivially. – juanpa.arrivillaga Sep 30 '21 at 19:40