4

As of Java 18 the incubating foreign function interface doesn't appear to have a good way to handle C++ code. I am working on a project that requires bindings to C++ and I would like to know how to avoid creating a thunk library in C.

One of the C++ classes looks something like this:

namespace library {

typedef uint8_t byte;

class CppClass {
 public:
  static oncstexpr const char* DefaultArgument = "default";

  CppClass(const std::string& argument = DefaultArgument);
  virtual ~CppClass();

  bool doStuff();

  bool handleData(std::vector<byte>* data);

 private:
  std::unique_ptr<InternalType> internalState;
};

}

I would like to create a Java class that looks something like the following to mirror that (with error checking left out):

public final class CppClass implements AutoCloseable {
    public static final String DefaultArgument = "default";

    private static final MethodHandle NEW;
    private static final MethodHandle FREE;
    private static final MethodHandle DO_STUFF;
    private static final MethodHandle HANDLE_DATA;

    static{
        var binder = Natives.getBinder();
        NEW = binder.bind("(manged constructor)", ValueLayout.ADDRESS, ValueLayout.ADDRESS);
        FREE = binder.bindVoid("(manged deconstructor)", ValueLayout.ADDRESS);
        DO_STUFF = binder.bind("(manged doStuff)", ValueLayout.JAVA_BYTE, ValueLayout.ValueLayout.ADDRESS);
        HANDLE_DATA = binder.bind("manged handleData)", ValueLayout.JAVA_BYTE, ValueLayout.ADDRESS, ValueLayout.ADDRESS, ValueLayout.JAVA_LONG);
    }

    private final MemorySegment pointer;

    public CppClass() {
        this(DefaultArgument);
    }

    public CppClass(String argument) {
        try(var scope = MemoryScope.newConfinedScope()) {
            var allocator = MemoryAllocator.nativeAllocator(scope);
            pointer = (MemoryAddress)NEW.invokeExact(
                allocator.allocateUtf8String(argument)
            );
        }
    }

    @Override
    public void close() {
        FREE.invokeExact(pointer);
    }

    public boolean doStuff() {
        return (byte)DO_STUFF.invokeExact(pointer) != 0;
    }

    public boolean handleData(MemorySegment segment) {
        return (byte)HANDLE_DATA.invokeEact(pointer, segment.address(), segment.byteSize()) != 0;
    }
}

where Binder looks something like this:

public interface Binder {
    MethodHandle bind(String name, FunctionDescriptor desc);
    MethodHandle bind(String name, MemoryLayout result, MemoryLayout... args);
    MethodHandle bindVoid(String name, MemoryLayout... args);
}

I am not sure what parts of this are correct. My biggest implementation questions are:

  • What is the correct way to call constructors and destructors?
  • What is the correct way to call methods?
  • What is the correct way to handle the std types (std::string, std::vector)
  • Do C++ compilers add the default argument values at compile time, or do they generate multiple methods?
gudenau
  • 500
  • 5
  • 19
  • 1
    Default arguments are effectively pasted in by the compiler when it is reading the code that makes the call. You write, `CppClass example;` The compiler turns that into `CppClass example("default");` and then compiles. That means if the C++ compiler is not generating the code for the call, all bets are off from C++'s point of view. – user4581301 Jun 14 '22 at 22:17
  • 3
    You seem to be looking for [C++ ABI (Application Binary Interface)](https://stackoverflow.com/questions/67839008/please-explain-the-c-abi). There isn't a standardized one. Each compiler defines its own, and even then it often changes from version to version. – Igor Tandetnik Jun 14 '22 at 22:17
  • So there is a pretty good chance that I just need to use a small library to convert everything. Not a huge deal, but that will be annoying to do. – gudenau Jun 14 '22 at 22:20
  • 3
    Many programming languages have accommodations for the **C ABI**, because it's fairly stable, well defined on almost every platform, the "native" ABI for many operating systems, and nigh universal. Getting FORTRAN to talk to Pascal might be done using a C ABI. Java might use JNI to talk to a C ABI that is backed by C++. – Eljay Jun 14 '22 at 22:27
  • 2
    See also [What is the effect of extern "C" in C++?](https://stackoverflow.com/q/1041866/2711488) – Holger Jun 15 '22 at 09:23
  • 3
    There is no support in the foreign linker for any C++ ABIs at this point. So, to call C++, you will need to expose functions with C linkage. – Jorn Vernee Jun 15 '22 at 19:21
  • What about support for custom (user-defined) linkers? As far as I understand it, such a linker *could* be built if someone has enough knowledge about the specific underlying naming/mangling rules. But `Linker` is sealed. – Johannes Kuhn Jun 15 '22 at 19:38
  • So the last question I have right now is, how do you correctly handle `char*` -> `std::string&`? Like how do I manage the memory used by them to ensure they don't leak memory and don't have use-after frees. – gudenau Jun 15 '22 at 21:19
  • 1
    In C++ I think you're able to convert a char* to a string just through assignment. i.e. in the wrapper, declare an `std::string` variable, assign that the `char*` and then pass the variable to the target function. This will make a copy though. I think `std::string_view` will allow you to avoid the copy. – Jorn Vernee Jun 15 '22 at 22:06
  • 2
    @JohannesKuhn Yes, someone could build their own linker, if they were content with going through the C ABI as well. i.e. it's theoretically possible (though extremely hard) to build a C++ linker using the current C linker. There's no need to implement `Linker` for that. That's just the interface used to expose the implementations in the JDK. But, there's no APIs that require being given a `Linker` instance. So it doesn't matter that it's sealed. – Jorn Vernee Jun 15 '22 at 22:10

1 Answers1

4

So the general answer seems to be "just create a shim library" because the C++ ABI is far more fluid and not supported by Java.

As for the answers at the end:

  • You just do it like normal, but with void* pointers
  • Pass in this as a void* and treat it as an opaque pointer
  • Handled automatically in the shim, from what I gather std::string makes a copy and has an internal reference count
  • The default arguments are handled at compile time
gudenau
  • 500
  • 5
  • 19
  • 3
    Yep, and this is generally true, not just for C++ and Java. If you want language X to talk to language Y (for any X and Y), the general answer is "go through C" unless there happens to be a better way for the specific languages. Examples of a better way would be: Two JVM languages can always talk through Java, anything running in the browser should be able to speak to Javascript, etc, etc. – Silvio Mayolo Jun 15 '22 at 22:46