Java: Performance of ByteBuffer versus jdk.incubator.foreign (Panama) Foreign Memory methods (MemoryLayout/Segment)

Question

Background

I'm self-studying databases in my spare time, trying to learn by implementing one ground-up.

One of the first things you have to implement is the underlying data format and storage mechanisms.

In DB's, there is a structure called a "Slotted Page", which looks like this:

+-----------------------------------------------------------+
| +----------------------+  +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
| | HEADER               |  | | | | | | | | | | | | | | | | |
| |                      |  | | | | | | | | | | | | | | | | |
| +----------------------+  +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
|                                     SLOT ARRAY            |
|                                                           |
|                                                           |
|                                                           |
|                 +--------------------+ +----------------+ |
|                 |  TUPLE #4          | |  TUPLE #3      | |
|                 |                    | |                | |
|                 +--------------------+ +----------------+ |
|         +--------------------------+ +------------------+ |
|         |  TUPLE #2                | |  TUPLE #1        | |
|         |                          | |                  | |
|         +--------------------------+ +------------------+ |
+-----------------------------------------------------------+

The page data is stored via binary serialization to a file. The slots are the simplest part, where the definition might look something like this:

struct Slot {
  uint32_t offset;
  uint32_t length;
}

And in C++ the process of reading/writing this might be a std::memcpy

// Ignoring offset of header size in below
void write_to_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
    memcpy(buffer + sizeof(Slot) * slot_idx, &slot.offset, sizeof(uint32_t));
    memcpy(buffer + sizeof(Slot) * slot_idx + sizeof(uint32_t), &slot.length, sizeof(uint32_t));
}

void read_from_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
    memcpy(&slot.offset, buffer + sizeof(Slot) * slot_idx, sizeof(uint32_t));
    memcpy(&slot.length, buffer + sizeof(Slot) * slot_idx + sizeof(Slot), sizeof(uint32_t));
}

In Java, to my knowledge you can do one of either two things:

ByteBuffer

record Slot(int offset, int length) {
    void write(ByteBuffer buffer) {
        buffer.putInt(offset).putInt(length);
    }
    
    static Slot read(ByteBuffer buffer) {
        return new Slot(buffer.getInt(), buffer.getInt());
    }
}

New foreign memory stuff

record Slot(int offset, int length) {
    public static MemoryLayout LAYOUT = MemoryLayout.structLayout(
            ValueLayout.JAVA_INT.withName("offset"),
            ValueLayout.JAVA_INT.withName("length"));

    public static TupleSlot from(MemorySegment memory) {
        return new TupleSlot(
                memory.get(ValueLayout.JAVA_INT, 0),
                memory.get(ValueLayout.JAVA_INT, Integer.BYTES));
    }

    public void to(MemorySegment memory) {
        memory.set(ValueLayout.JAVA_INT, 0, offset);
        memory.set(ValueLayout.JAVA_INT, Integer.BYTES, length);
    }
}

What would the performance difference be between these?

I'd prefer the ByteBuffer API if it's negligible.

You should try out both approaches and *measure* the performance. To get an accurate prediction of what performance you will get *in your application*, we would need (full) implementations of the relevant parts of your application code to analyze. But at that point you may as well just measure it yourself ... and skip the predictions. — Stephen C, Aug 20 '22 at 03:37
But looking at your minimal examples, it doesn't strike me that there would be much performance difference ... for those examples. — Stephen C, Aug 20 '22 at 03:38
If your goal here is "self study", then it really doesn't matter one way or the other which one performs better. (But if you have thoughts of using your ground-up reimplementation for anything other than learning, your thoughts are (IMO) probably dreams ... or worse: nightmares. Your hypothetical implementation would most likely turn into *technical debt* for anyone who decided to use it in production code.) — Stephen C, Aug 20 '22 at 03:47
@StephenC That's fair enough, I guess what I was hoping for is someone familiar with the internals of both the ByteBuffer code and the Foreign Memory API to be able to give a comparison/technical breakdown between them. Unfortunately there isn't a lot of info available on the Panama stuff atm. — Gavin Ray, Aug 20 '22 at 04:39
Well 1) that's not what you asked (!!), and 2) that would most likely be Too Broad for a StackOverflow question. (Maybe the solution to there not being enough good technical info on Panama would be for you to do a bunch of research and >write< about what you learned. Getting developers to write decent documentation is a perennial problem, but complaining about it doesn't solve it ...) — Stephen C, Aug 20 '22 at 04:50
Is this ```memcpy(buffer + slot.offset, &slot.length, sizeof(uint32_t));``` supposed to be ```memcpy(buffer + slot.offset, &slot.offset, sizeof(uint32_t));``` — swpalmer, Aug 23 '22 at 17:31

score 2 · Accepted Answer · answered Aug 22 '22 at 21:05

Answering with response from Paul Sandoz on the panama-dev mailing list:

Hi Gavin,

Using MemorySegment will given you far more control over the description (layout) and management (freeing and pooling) than ByteBuffer. Also, if it’s an issue you will also not be constrained by ByteBuffer’s size limitation. Performance wise using MemorySegment should be as good as or better than ByteBuffer.

In many respects MemorySegment is a better API to interact with native memory. ByteBuffer was introduced in Java 1.4 with NIO and had additional design constraints in mind that are less relevant today (such as an internal mutable index).

Paul.

Java: Performance of ByteBuffer versus jdk.incubator.foreign (Panama) Foreign Memory methods (MemoryLayout/Segment)

Background

1 Answers1