3

All this is with the new JDK 17.

I'm attempting to turn an on-heap byte array into a MemorySegment and pass that to a native function. I created simple sample code that shows this:

        final CLinker cLinker = CLinker.getInstance();
        
        // int strlen(const char *str);
        final Optional<MemoryAddress> oSymbolAddress = CLinker.systemLookup().lookup("strlen");
        
        final MethodHandle mh = cLinker.downcallHandle(oSymbolAddress.get(),
                MethodType.methodType(int.class, MemoryAddress.class),
                FunctionDescriptor.of(C_INT, C_POINTER));        
        
        out.println("I found this method handle: " + mh);
        final byte[] ba = new byte[100];
        ba[0] = 'h';
        ba[1] = 'e';
        ba[2] = 'l';
        ba[3] = 'l';
        ba[4] = 'o';
        ba[5] = 0;
        
        final MemorySegment stringSegment = MemorySegment.ofArray(ba);
        final int result = (Integer) mh.invoke(stringSegment.address());
        
        out.println("The length of the string is: " + result);

It tries to run but it throws:

Exception in thread "main" java.lang.UnsupportedOperationException: Not a native address
    at jdk.incubator.foreign/jdk.internal.foreign.MemoryAddressImpl.toRawLongValue(MemoryAddressImpl.java:91)

If instead of using MemorySegment.ofArray(ba), I use this:

        final MemorySegment stringSegment = MemorySegment.allocateNative(100, newImplicitScope());
        stringSegment.asByteBuffer().put(ba);

it works and gives the expected answer (5).

I looked up this function in MemoryAddressImpl.java and I can see:

    @Override
    public long toRawLongValue() {
        if (segment != null) {
            if (segment.base() != null) {
                throw new UnsupportedOperationException("Not a native address");
            }
            segment.checkValidState();
        }
        return offset();
    }

Clearly segment.base() is returning null.

I don't really understand what's going on. I thought that one of the major advantages of Project Panama would be that native code could access on-heap memory, to avoid having to copy. I certainly can do an allocateNative() and then copy the byte array into that, but that's a copy that I think should be avoided.

Any ideas on this? Am I doing something wrong or misunderstanding how to use on-heap memory?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
user2959589
  • 362
  • 1
  • 9
  • 23
  • Check if `segment` is initialised before, if it isn’t initialised one of the reasons for `segment.base` returning null. In your current code I can only find `stringSegment` initialised. It’s hard to pin point without the full code – Archana David Nov 11 '21 at 11:14
  • 1
    Without pinning the on-heap array, the garbage collector could move it somewhere else at any time. With Java objects/references this wouldn't be a problem since the GC manages them all, but with native code the GC has no way to rewrite the pointer. That might be why, without making an off-heap copy, that JEP 412 API disallows this explicitly. – prunge Nov 12 '21 at 10:55

1 Answers1

2

I thought that one of the major advantages of Project Panama would be that native code could access on-heap memory, to avoid having to copy.

Actually despite the major advance in usability coming with project Panama, this won't be possible, for multiple reasons.

  • GC moves things around in the Java heap memory, hence the address of any object (including tables) may/will change over time. However the native code is given a pointer to a memory address which of course won't be updated after a GC cycle (not even mentioning accessing that memory in the middle of a cycle).
  • JNI had APIs to actually prevent GC from happening while in the middle of native code through Get*Critical sections. Unfortunately preventing the GC may have significant impact on application performance.

In fact Project Panama is exactly trying to avoid blocking the GC. This is why there's a clear separation of the accessed memory and why it is necessary to copy to/from native memory.

This shouldn't be much of an issue, unless this is hot code path (ie it's called very very often), or the code deals with very big data. In such case the code may want to do most of the work off-heap. If the data is in a file, maybe access that file from native code, or use a panama memory mapped file.

var big = Path.of("path/to/big.data");
try (var scope = ResourceScope.newConfinedScope()) {
  var bigMM = MemorySegment.mapFile(big, 0, Files.size(big), FileChannel.MapMode.READ_ONLY, scope);
  return (int) mh.invoke(bigMM.address());
}
bric3
  • 40,072
  • 9
  • 91
  • 111
  • Ok now that makes sense. It's by design and it can be no other way. I forgot, the GC does move things around at various times to organize memory. In this case I'll probably have to do a copy or find some other approach. I just was wanting to squeeze the maximum performance by avoiding copies, but given the GC, it can't be avoided in every case. – user2959589 Nov 12 '21 at 18:50
  • 1
    @user2959589 Careful about premature optimization. It's better to have simple and correct code first then if necessary profile and tune the code where there's a problem. Anyway if the code is using multiple native call in a row it's still possible to avoid copying back, ie just pass pointers. – bric3 Nov 12 '21 at 19:01