Background
I'm self-studying databases in my spare time, trying to learn by implementing one ground-up.
One of the first things you have to implement is the underlying data format and storage mechanisms.
In DB's, there is a structure called a "Slotted Page", which looks like this:
+-----------------------------------------------------------+
| +----------------------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
| | HEADER | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |
| +----------------------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
| SLOT ARRAY |
| |
| |
| |
| +--------------------+ +----------------+ |
| | TUPLE #4 | | TUPLE #3 | |
| | | | | |
| +--------------------+ +----------------+ |
| +--------------------------+ +------------------+ |
| | TUPLE #2 | | TUPLE #1 | |
| | | | | |
| +--------------------------+ +------------------+ |
+-----------------------------------------------------------+
The page data is stored via binary serialization to a file. The slots are the simplest part, where the definition might look something like this:
struct Slot {
uint32_t offset;
uint32_t length;
}
And in C++ the process of reading/writing this might be a std::memcpy
// Ignoring offset of header size in below
void write_to_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
memcpy(buffer + sizeof(Slot) * slot_idx, &slot.offset, sizeof(uint32_t));
memcpy(buffer + sizeof(Slot) * slot_idx + sizeof(uint32_t), &slot.length, sizeof(uint32_t));
}
void read_from_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
memcpy(&slot.offset, buffer + sizeof(Slot) * slot_idx, sizeof(uint32_t));
memcpy(&slot.length, buffer + sizeof(Slot) * slot_idx + sizeof(Slot), sizeof(uint32_t));
}
In Java, to my knowledge you can do one of either two things:
- ByteBuffer
record Slot(int offset, int length) {
void write(ByteBuffer buffer) {
buffer.putInt(offset).putInt(length);
}
static Slot read(ByteBuffer buffer) {
return new Slot(buffer.getInt(), buffer.getInt());
}
}
- New foreign memory stuff
record Slot(int offset, int length) {
public static MemoryLayout LAYOUT = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("offset"),
ValueLayout.JAVA_INT.withName("length"));
public static TupleSlot from(MemorySegment memory) {
return new TupleSlot(
memory.get(ValueLayout.JAVA_INT, 0),
memory.get(ValueLayout.JAVA_INT, Integer.BYTES));
}
public void to(MemorySegment memory) {
memory.set(ValueLayout.JAVA_INT, 0, offset);
memory.set(ValueLayout.JAVA_INT, Integer.BYTES, length);
}
}
What would the performance difference be between these?
I'd prefer the ByteBuffer API if it's negligible.