28

Java is pass-by-value. How could you modify the language to introduce passing by reference (or some equivalent behavior)?

Take for example something like

public static void main(String[] args) {
    String variable = "'previous String reference'";
    passByReference(ref variable);
    System.out.println(variable); // I want this to print 'new String reference'
}

public static void passByReference(ref String someString) {
    someString = "'new String reference'";
}

which (without the ref) compiles to the following bytecode

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String 'previous String reference'
       2: astore_1
       3: aload_1
       4: invokestatic  #3                  // Method passByReference:(Ljava/lang/String;)V
       7: return

  public static void passByReference(java.lang.String);
    Code:
       0: ldc           #4                  // String 'new String reference'
       2: astore_0
       3: return

The code at 3: loads the reference onto the stack from the variable variable.

One possibility I'm considering is to have the compiler determine a method is pass by reference, possibly with ref, and change the method to accept a Holder object which stores the same reference as our variable. When the method completes, and possibly changes that reference in the holder, the variable on the caller side's value is replaced with the holder reference's value.

It should compile to an equivalent of this

public static void main(String[] args) {
    String variable = "'previous String reference'";
    Holder holder = Holder.referenceOf(variable);
    passByReference2(holder);
    variable = (String) holder.getReference(); // I don't think this cast is necessary in bytecode
    System.out.println(variable);
}

public static void passByReference(Holder someString) {
    someString.setReference("'new String reference'");
}

where Holder might be something like

public class Holder {
    Object reference;
    private Holder (Object reference) {
        this.reference = reference;
    }
    public Object getReference() {
        return this.reference;
    }
    public void setReference(Object reference) {
        this.reference = reference;
    }
    public static Holder referenceOf(Object reference) {
        return new Holder(reference);
    }
}

Where can this fail or how could you improve it?

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • 4
    Are you familiar with [Jasmin](http://faculty.cs.wwu.edu/meehan/Jasmin/)? I've always liked the answer there - under "Implementing Call-by-reference for your language using the JVM instruction set." Spoiler: They called "reference" - "value". – Elliott Frisch Jan 22 '14 at 04:45
  • @ElliottFrisch Thanks for the link, I wasn't familiar with Jasmin. It seems I'm suggesting something similar to their solution with wrapper classes. – Sotirios Delimanolis Jan 22 '14 at 04:49
  • Unfortunately, they haven't updated the [homepage](http://jasmin.sourceforge.net/) for almost ten years. I own the book. – Elliott Frisch Jan 22 '14 at 04:51
  • Why not just *returning* the new value? Or why not using a different programming language that matches your requirements instead of trying to change this one? – Holger Jan 24 '14 at 15:02
  • @holger You might want to return another value from the method. It's a theoretical language design question. – Sotirios Delimanolis Jan 24 '14 at 15:08
  • 10
    It doesn’t sound like a theoretical language design question; your entire question is about how to implement it, though you already answer it yourself giving an entire solution right inside the question. So what is your question actually? That you are opening a can of worms introducing a language feature that makes local variables non-local needs no discussion. That was understood over a decade ago when Java was created and this language design decision, not to support such thing, was made. As said, you can use a different language if you don’t like it. – Holger Jan 24 '14 at 15:20
  • I recommend an annotation to indicate that a parameter should be pass-by-ref. Since it would be hard to modify the compiler, you could write a bytecode manipulator. – tbodt Jan 25 '14 at 00:19
  • @tbodt That would look nice as well. – Sotirios Delimanolis Jan 25 '14 at 00:20
  • I may write such a bytecode manipulator. – tbodt Jan 25 '14 at 00:21
  • 1
    It would use one-element arrays instead of custom classes. – tbodt Jan 25 '14 at 00:22
  • A similar concept is used for SOAP based Web Services in providing WSDL-conform `OUT` and `INOUT` operation parameters. As parameters might get updated with new values, these changes need to be returned appropriately to the caller. Java provides therefore the [Holder](http://docs.oracle.com/javaee/7/api/javax/xml/ws/Holder.html) class which is similar to your suggestion. So, unless you propose a language addition (maybe in a C++ like way with adding `&` at the end of a variable or on including an explicit ref keyword), this (and the array way) will probably be the only working solutions, IMO – Roman Vottner Jan 27 '14 at 00:14
  • @RomanVottner Interesting SOAP workaround. And yes, I'm not trying to achieve this in base Java. I'm trying to see how I can modify byte code (compilation) to achieve it. – Sotirios Delimanolis Jan 27 '14 at 00:17
  • @RomanVottner, I prefer one-element arrays over generics here for two reasons: primitives would in practice be "double-boxed" since they'd have to be boxed before being treated generically, and arrays throw immediately on ill-typed assignment to their members. – Jeffrey Hantin Jan 28 '14 at 07:07

11 Answers11

21

The usual idiom I've seen for pass-by-reference in Java is to pass a single-element array, which will both preserve run-time type-safety (unlike generics which undergo erasure) and avoid the need to introduce a new class.

public static void main(String[] args) {
    String[] holder = new String[1];

    // variable optimized away as holder[0]
    holder[0] = "'previous String reference'";

    passByReference(holder);
    System.out.println(holder[0]);
}

public static void passByReference(String[] someString) {
    someString[0] = "'new String reference'";
}
Jeffrey Hantin
  • 35,734
  • 7
  • 75
  • 94
  • This is good for base java. I'll be honest, though, I'd much rather keep my parameters the type they actually are and not have to retrieve an array element. However, if we could change the bytecode to make it equivalent to the code you're showing, then that would be fine. – Sotirios Delimanolis Jan 22 '14 at 13:44
  • @Sotirios, that's really what I meant: equivalent base Java code post-transformation. You could use a `ref` keyword on parameters at both declaration and invocation as a trigger for this transformation, and doing so would allow interoperability with existing base Java code using this idiom. – Jeffrey Hantin Jan 22 '14 at 23:26
  • Ah ok, I'll check to see how hard/possible it is to do that at the bytecode level. Thanks. – Sotirios Delimanolis Jan 22 '14 at 23:28
14

To answer your question:

Where can this fail?

  1. Final variables and enum constants
  2. 'Special' references such as this
  3. References that are returned from method calls, or constructed inline using new
  4. Literals (Strings, integers, etc.)

...and possibly others. Basically, your ref keyword must only be usable if the parameter source is a non-final field or local variable. Any other source should generate a compilation error when used with ref.

An example of (1):

final String s = "final";
passByReference(ref s);  // Should not be possible

An example of (2):

passByReference(ref this);  // Definitely impossible

An example of (3):

passByReference(ref toString());  // Definitely impossible
passByReference(ref new String("foo"));  // Definitely impossible

An example of (4):

passByReference(ref "literal");  // Definitely impossible

And then there are assignment expressions, which seem to me like something of a judgement call:

String s;
passByReference(ref (s="initial"));  // Possible, but does it make sense?

It's also a little strange that your syntax requires the ref keyword for both the method definition and the method invocation. I think the method definition would be sufficient.

Kevin K
  • 9,344
  • 3
  • 37
  • 62
  • Thanks for the answer. That's the kind of situations I was looking for. There are solutions at compile time for all of these (like C# does). Basically, you only allow references coming from non-final variables. – Sotirios Delimanolis Jan 29 '14 at 19:58
  • FYI - I'm not going to claim the list I provided is exhaustive, there may be others. I did add one additional example that's more of a grey area (assignment expressions) – Kevin K Jan 29 '14 at 20:09
  • In Java, assignment expressions used as values, like `(s = "initial")` actually push the the value of the right hand side on the stack twice, pop once to assign to variable, and pop again to use as the value of the expression. There's probably a way to detect that too. – Sotirios Delimanolis Jan 29 '14 at 20:12
9

Your attempt to modify the language ignores the fact that this "feature" was explicitly left out to prevent well-known side-effect bugs from being able to happen in the first place. Java recommends to do what you are trying to archive by the use of data-holder classes:

public class Holder<T> {
  protected T value;

  public T getValue() {
    return value;
  }

  public void setValue(T value) {
    this.value = value;
  }
}

A thread-safe version would be the AtomicReference.

Now storing a single String in a class seems over-kill and most likely it is, however usually you have a data-holder class for several related values instead of a single String.

The big benefit of this approach is that what happens inside the method is very explicit. So even if you are programming on a Monday morning after an eventful weekend and the coffee machine just broke down, you still can tell easily what the code is doing (KISS), preventing several bugs from even happening in the first place, just because you forgot about that one feature of method foo.

If you think about what your approach can do that the data-holder version cannot, you'll soon realize that you are implementing something just because it is different, but effectively it has no real value.

TwoThe
  • 13,879
  • 6
  • 30
  • 54
  • What problems would exist with pass-by-reference which are not even worse when using holder classes? In .NET, if an object passes one of its fields to an outside method as a byref, it can be certain that anything which is going to happen to that field as a consequence of that method call will happen before it returns (unless the outside code is granted "unsafe" permissions). By contrast, once a Java reference to a mutable object has been exposed to outside code, there's no way of telling how or when or by whom the object might arbitrarily be modified any time in the future. – supercat Jan 29 '14 at 19:49
  • Assume you have the function `add(myInt)`, but unknown to you, what myInt points to after calling that function is not what it was when you handed it over. There might be reasons for that, but effectively it makes the code hard to understand and bugs difficult to find. That is why one of the core principles of Java is to be very explicit about what the code is doing and a pass-by-reference violates that. For the same reason every book on Java tells you to not make mutable class-variables public. – TwoThe Jan 30 '14 at 14:28
  • I agree with your argument about why it was left out of the language, but I don't agree with the argument about the alternative being bug-prone. The solution I'm looking for would explicitly have markers in the source code that a method is pass by reference, something like a new keyword `ref` or an annotation like `@Ref`. – Sotirios Delimanolis Jan 30 '14 at 15:27
  • Even if you have annotations, you still don't know what happens inside. This is bug-prone not because it causes bugs in general, but because there will be that one day, where you just forget about that one specialty of this function. If you have something like `myInt = add(myInt, 1)` it is __very__ obvious what this does and you don't need to actually remember what `add` might do, it is explicit enough to get it just from reading the code. Even on Mondays before your first coffee. – TwoThe Jan 30 '14 at 20:34
7

Using AtomicReference class as holder object.

public static void main(String[] args) {
    String variable="old";
    AtomicReference<String> at=new AtomicReference<String>(variable);
    passByReference(at);
    variable=at.get();
    System.out.println(variable);
}

public static void passByReference(AtomicReference<String> at) {
  at.set("new");
}
Sinto
  • 920
  • 5
  • 10
  • Is there a specific reason you would need the reference to be atomic? – Sotirios Delimanolis Jan 22 '14 at 13:41
  • Well assignment is atomic: you would probably want to preserve that property. In any case the AtomicReference class is already there. Saves you having to add a holder class. – user207421 Jan 23 '14 at 00:09
  • @EJP Sure, but what Thread could intercept the reference changing? – Sotirios Delimanolis Jan 23 '14 at 00:15
  • `AtomicReference` doesn't add value here unless you need `getAndSet` or `compareAndSet`. Also, if the variable is of primitive type, it has to be *boxed* to be treated generically; a one-element array already acts as a mutable box, so you don't end up with two references where one will suffice. – Jeffrey Hantin Jan 23 '14 at 02:02
4

Oddly enough, I've been thinking about this problem myself recently. I was considering whether it might be fun to create a dialect of VB that ran on the JVM - I decided it wouldn't be.

Anyway, there are two main cases where this is likely to be useful and well defined:

  • local variables
  • object attributes

I'm assuming that you're writing a new compiler (or adapting an existing one) for your new dialect of Java.

Local variables are typically handled by code similar to what you're proposing. I'm most familiar with Scala, which doesn't support pass-by-reference, but does support closures, which have the same issues. In Scala, there's a class scala.runtime.ObjectRef, which resembles your Holder class. There are also similar {...}Ref classes for primitives, volatile variables, and similar.

If the compiler needs to create a closure that updates a local variable, it "upgrades" the variable to a final ObjectRef (which can be passed to the closure in its constructor), and replaces uses of that variable by gets and updates by sets, on the ObjectRef. In your compiler, you could upgrade local variables whenever they're passed by reference.

You could use a similar trick with object attributes. Suppose that Holder implements an interface ByRef. When your compiler sees an object attribute being passed by reference, it could create an anonymous subclass of ByRef that reads and updates the object attribute in its get and set methods. Again, Scala does something similar to this for lazily evaluated parameters (like references, but read-only).

For extra brownie points, you could extend the techique to JavaBean properties and even Map, List and Array elements.

One side effect of this is that at the JVM level, your methods have unexpected signatures. If you compile a method with signature void doIt(ref String), at the bytecode level, you'll end up with the signature void doIt(ByRef) (you might expect this to be something like void doIt(ByRef<String>), but of course generics use type erasure). This can cause problems with method overloading, as all by-ref parameters compile to the same signature.

It may be possible to do this with bytecode manipulation, but there are pitfalls, like the fact that the JVM permits applications to re-use local variables - so at the bytecode level, it may not be clear whether a parameter is being re-assigned, or its slot re-used, if the application was compiled without debugging symbols. Also, the compiler may elide aload instructions if there's no possibility of a value having changed within the outer method - if you don't take steps to avoid this, changes to your reference variable may not be reflected in the outer method.

James_pic
  • 3,240
  • 19
  • 24
1

Think about how it might be implemented with a primitive type, say int. Java - the JVM, not just the language - does not have any "pointer" type to a local variable, on the frame (method stack) or the operand stack. Without that, it is not possible to truly pass by reference.

Other languages that support pass-by-reference use pointers (I believe, though I don't see any other possibility). C++ references (like int&) are pointers in disguise.

I've thought of creating a new set of classes that extend Number, containing int, long, etc. but not immutable. This could give some of the effect of passing primitives by reference - but they won't be auto-boxed, and some other features might not work.

Without support in the JVM, you can't have real pass-by-reference. Sorry, but that's my understanding.

BTW, there are already several Reference-type classes (like you'd like for Holder). ThreadLocal<> (which has get() and set()), or the Reference extenders, like WeakReference (which I think only have get()).

Edit: After reading some other answers, I'd suggest that ref be a form of auto-boxing. Thus:

class ReferenceHolder<T> {
    T referrent;
    static <T> ReferenceHolder<T> valueOf(T object) {
        return new ReferenceHolder<T>(object);
    }
    ReferenceHolder(T object) { referrent = object; }
    T get()            { return referrent; }
    void set(T value)  { referrent = value; }
}

class RefTest {
    static void main() {
        String s = "Hello";
        // This is how it is written...
        change(s);
        // but the compiler converts it to...
        ReferenceHolder<String> $tmp = ReferenceHolder.valueOf(s);
        change($tmp);
        s = $tmp.get();
    }
    // This is how it is written...
    static void change(ref Object s) {
        s = "Goodbye";              // won't work
        s = 17;             // *Potential ClassCastException, but not here*
    }
    // but the compiler converts it tothe compiler treats it as:
    static <T> void change(ReferenceHolder<T> obj) {
        obj.set((T) "Goodbye");     // this works
        obj.set((T) 17);    // *Compiler can't really catch this*
    }
}

But see where there is potential for putting the wrong kind of type in the ReferenceHolder? If genericized properly, the compiler may be able to warn sometimes, but as you likely want the new code to resemble normal code as much as possible, there is the possibility of a CCEx with each auto-ref call.

Menachem
  • 911
  • 7
  • 22
  • The JVM doesn't matter if the byte code is effectively doing the job. If at compilation time I can process annotations (or new keywords like `ref`), then I could create (or modify) the byte code so that it is equivalent to passing by reference. – Sotirios Delimanolis Jan 30 '14 at 15:14
  • So, you're considering wrapping a `ref int` in a `SignedInt` (for example), passing a reference (by value) to that object and modifying the value stored in the object. Then, when the method returns, you copy the wrapped (and modified) value back into the local variable. I guess the crux of the issue is this last assignment. A sort of auto-boxing and auto-unboxing, which is where you can look for ideas (or, you probably already have). – Menachem Jan 30 '14 at 16:08
  • Yes, that's what I want to eventually transform the byte code to. – Sotirios Delimanolis Jan 30 '14 at 16:10
1

I think you can accomplish most of what you want by building an agent and using cglib.

Many of the examples given here can work. I'd recommend using the template you proposed because it will compile with the normal compiler.

public void doSomething(@Ref String var)

Then behind the scenes you use cglib to rewrite the annotated methods, which is easy. You'll also have to rewrite the caller, which i think will be much more complicated in cglib. javassist uses more of a "source code" oriented approach, and might be better suited for rewriting the callers.

Ted Bigham
  • 4,237
  • 1
  • 26
  • 31
0

Answering you question about how to extend the language my pick would be: - Using various holders technics as several other answers describe - Use annotations to attach metadata regarding which arguments should be passed by reference and then start juggling with a byte code manipulation library, like cglib in order to fulfil your ideas in byte code itself.

Though this whole idea seems strange.

aljipa
  • 716
  • 4
  • 6
0

There are several ways to write Java code as effectively pass-by-reference, even within the standard pass-by-value conventions.

One approach is to use instance or static variables whose scope includes a particular method, in lieu of explicit parameters. The variables which are being modified could be included in the comments, if you really want to mention their names at the beginning of a method.

The disadvantage with this approach is that the scope of these variables would need to encompass the entire class in question, rather than only the method. If you would like to restrict the variables' scopes more precisely, you could always modify them using getter and setter methods rather than as parameters.

Having worked with both Java and C/C++, I don't think Java's supposed inflexibility in being pass-by-value only is a big deal--for any programmers who know what happens to the variables, there are reasonable workarounds that can accomplish the same things functionally.

La-comadreja
  • 5,627
  • 11
  • 36
  • 64
  • I'm not trying to do this in plain Java. I'm trying to figure some ways to manipulate byte code to achieve it, possibly without affecting Java's syntax and/or the types themselves. For example, if the method is declared as `public void doSomething(@Ref String var)`, then the argument should be passed by reference. I don't want to change the source code to look like `public void doSomething(Holder var)`. – Sotirios Delimanolis Jan 27 '14 at 16:58
0

The question is about the language itself, but answers seems to mention practical tricks. Adding to the list.

There is an option to hold data in atomics, such as AtomicReference<T>

  • new AtomicReference<>("my data")

Also it seems that Pair and other sorts of Tuples are good holders, so its often happens that you already have a holder in your project.

  • Tuples.of(....) (reactor.util.function.Tuples)
  • Pair.of(a, b) (spring, apache commons etc)
  • AbstractMap.SimpleEntry (jdk)
  • new Object[]{"my data"} can be a holder too
Lauri
  • 201
  • 1
  • 6
-1

Java is (in fact) pass by reference. When the method is called, the reference(pointer) to the object is passed and when you modify the object you can see the modification when you return from the method. The problem with your example is that java.lang.String is immutable.

And what you are actually achieving with your example is output parameters.

Here is a slightly different version of Jeffrey Hantin:

public static void main(String[] args) {
  StringBuilder variable = new StringBuilder("'previous String reference'");
  passByReference(variable);
  System.out.println(variable); // I want this to print 'new String reference'
}

public static void passByReference(StringBuilder someString) {
  String nr = "'new String reference'";
  someString.replace(0, nr.length() - 1, nr);
}
TomWolk
  • 968
  • 10
  • 13
  • 2
    No, Java is very much pass by value. The reference of the object isn't passed. A copy of the value of the reference to the object is passed. In your example, in `passByReference`, if you changed the reference of `someString` to a new object, ie. `someString = new StringBuilder()`, that would not be visible from the calling code. I want to extend the language so that that change _is_ visible. – Sotirios Delimanolis Jan 30 '14 at 15:07
  • Also, I want the types of the variables in the source code to remain the same. They can be changed in the byte code if necessary. – Sotirios Delimanolis Jan 30 '14 at 15:08
  • Sorry, -1 because of "Java is (in fact) pass by reference." – Koray Tugay Apr 13 '14 at 18:24