0

Currently developing on a custom programming language on the JVM, I would like the language to support by-reference parameters in methods. How would I go about doing that? So far, I was able to come up with three different ways to accomplish this.

  1. Wrapper Objects

The idea behind this is to create a wrapper object that is created containing the current value of the field, passed to the by-ref method call, and then unboxed after the call. This is a fairly straight-forward way to do this, but requires a lot of 'garbage' objects that are created and immediately discarded.

  1. Arrays

Simply create an array of the type with 1 element, put field value in the array, call the method passing the array and finally assign the field from the array. The nice thing about this is that it ensures runtime type-safety, other than a generic wrapper class which would require additional casts.

  1. Unsafe

This one is slightly more advanced: Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory, call the method and pass the address, re-assign the field from the native memory address, and free it up again.

Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?

Clashsoft
  • 11,553
  • 5
  • 40
  • 79
  • Related: http://stackoverflow.com/questions/21274346/how-can-you-extend-java-to-introduce-passing-by-reference – Sotirios Delimanolis Apr 10 '15 at 18:16
  • Note that arrays are also objects, so the solution with arrays will also create short-lived, temporary objects just like the solution with wrapper objects. – Jesper Apr 10 '15 at 18:31

2 Answers2

4

Wrapper Objects [...] but requires a lot of 'garbage' objects that are created and immediately discarded.

If the lifetime of such a wrapper is limited to a callsite (+ inlined callee) then the compiler may be able to prove that through escape analysis and avoid the allocation by decomposing the wrapper object into its primitive members and use them directly in the generated code.

That essentially requires that those reference-wrappers are never stored to fields and only passed as method arguments

Unsafe Use sun.misc.Unsafe to allocate some native memory space, store the field value on that memory

You cannot store object-references in native memory. The garbage collector would not know about it and thus could change the memory address under your feet or GC the object if that is your only reference.

But since you're creating your own language you could simply desugar field references into object references + an offset. I.e. pass two parameters (object ref + long offset) instead of one. If you know the offset you can use Unsafe to manipulate the field.

Obviously this will only work for object fields. Local references cannot be changed this way.

Bonus: Is it possible to implement pointers and pointer arithmetic using the Unsafe class?

Yes for unmanaged memory.

For memory within the managed heap you are only allowed to point to objects themselves and do pointer arithmetic relative to the object header.
And you always must store object references in Object-typed fields. Storing them in a long would lead to GC-implementations (precise ones at least) missing the reference.


Edit: You might also be interested in ongoing work in the JDK regarding VarHandles. It's something you probably want to keep in mind when developing your language.

the8472
  • 40,999
  • 5
  • 70
  • 122
3

It’s seems you have missed an important point about the pass-by-reference concept: whenever a write into the reference happens, the referenced variable will be updated. This is different to any concept like yours that will actually pass a copy in a holder and update the original variable upon method return.

You can notice the difference even in single-threaded use case:

foo(myField, ()-> {
    // if myField is pass-by-reference, whenever foo() modifies
    // it and calls this Runnable, it should see the new value:
    System.out.println(myField);
});

Of course, you could make both references accessing the same wrapper, but for an environment allowing (almost) arbitrary code, it would imply that you would have to replace every reference to the field (in the end, change the contents of the field) to the wrapper.


So if you want to implement a clean, real pass-by-value mechanism within the JVM, it must be able to modify the referenced artifact, i.e. field or array slot. For local variables, there is no way to do it so there’s no way around replacing local variables with a holder object once a reference to it has been created.

So the kind of options is already known, you can pass a java.lang.reflect.Field (does not work with array slots), a pair of java.lang.invoke.MethodHandle or an arbitrary typed object (of a generated type) offering read and write access.

When implementing this reference accessor type, you can resort to Unsafe to create an anonymous class just like Java’s lambda expression facility does. If fact, you can steal inspire yourself a lot from the lambda expression mechanism:

  • put an invokedynamic instruction at the place where a reference has to be created, pointing to your factory method and providing a handle to the field or array slot
  • Let the factory analyze the handle and dynamically create the accessor implementation, the main difference being that your type will have two operations, read and write
  • Use Unsafe to create that class (which might access the field, even if its private)
  • If the field is static, create an instance and return a CallSite with a handle returning that instance
  • Otherwise return a CallSite with a handle pointing to the constructor of the accessor class accepting an object instance or an array

This way you will only have an overhead at the first-time usage while subsequent uses will either use singleton in the case of static fields or construct an accessor on-the-fly for instance fields and array slots. These accessor instance creation can be elided by HotSpots escape analysis if used frequently just like with ordinary objects.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • This makes sense for object fields since they might be used in the method that has the by-ref parameter, but shouldn't it be safe to use one of my ideas for local variables, especially seeing that you don't have an actual `Field` or offset for these? – Clashsoft Apr 10 '15 at 19:02
  • In the case of local variables, every non-local access (imagine `myField` in my code example is a local variable…) has to access a wrapper and if you ensure that they all are accessing the same wrapper, it’ll work, assuming that your language doesn’t support storing references into other variables that might have a longer lifetime. In that case, you can turn back to pure local variable usage upon method return. That would be close to your variants 1. and 2. (as explained in [the other answer](http://stackoverflow.com/a/29568556/2711488) 3. wouldn’t work) – Holger Apr 10 '15 at 19:07
  • Concerning your `invokedynamic` approach, something that comes to my mind would be the handling of arrays. Wouldn't it be sufficient to pass some sort of `ArrayRef` instance that holds the array itself and the index of the array, instead of having an entire Anonymous Class dedicated to the array access? (The same concept could be used with fields, but using reflection with the field name instead of array access inside the `XRef` class) – Clashsoft May 10 '15 at 11:50
  • The “entire Anonymous Class dedicated to the array access” is nothing else than your “`ArrayRef` instance that holds the array itself and the index of the array”— except that it doesn’t have a name. Your bootstrap method still may remember the type after its first creation and reuse for subsequent array accessor creations. And of course, you can use Reflection for fields (named as first and second options in my answer), the third option of generating a class is just for maximum performance. – Holger May 11 '15 at 07:57
  • Using Reflection doesn’t conflict with using `invokedynamic`; since this approach does not mandate a particular implementation, the bootstrap method can also return Reflection based accessors or types known at compile time like the array accessors. But keep in mind that even if you treat object arrays generically, there are eight primitive type arrays requiring specialized types (unless you want to use boxing everywhere). Since these types have a very simple structure, I’d prefer to generate them all at runtime, instead of having to maintain materialized files for them. – Holger May 11 '15 at 08:01
  • >(which might access the field, even if its private) Wouldn't the Verifier stop me from accessing a `private` field like this? – Clashsoft May 22 '15 at 11:46
  • Classes created via `Unsafe.defineAnonymousClass` have lesser access restrictions; the Verifier knows that. This is how the function objects for lambda expressions work as the synthetic methods hosting the lambda body’s code are usually `private`. See also https://blogs.oracle.com/jrose/entry/anonymous_classes_in_the_vm – Holger May 22 '15 at 11:54
  • Another solution for Fields that I could think of would be to pass an Object that holds the instance (or the static field base for static fields) and the address of the field relative to the instance. Since the `Ref` objects have `get` and `set` methods anyway, the implementation of those would delegate to an `Unsafe.getX` or `putX` call with the stored instance / field base and offset. Since this involves `native` calls to `Unsafe` *at every access point*, I am not 100% sure if that would outrun your anon class approach (but certainly the reflection one). – Clashsoft May 22 '15 at 18:54
  • I would expect these `native` methods to be handled like intrinsics by the JVM so their performance would be on par with the ordinary field access within the anonymous class, however, that code would have a much higher risk regarding potential programming errors and it would depend much deeper on the internal, perhaps changing API. – Holger May 22 '15 at 19:02
  • That might be true, but it seems like it would be *a lot* easier to implement and maintain. The Ref classes themselves would have like 20 lines of code and can be easily hand-written (of course multiple to support primitive types). The only thing the compiler has to do is generate a constructor call for these classes at the call site, and the offsets and static field bases would be stored in static synthetic fields that are initialized when the class is loaded. – Clashsoft May 22 '15 at 19:20