I'm working on a library that targets Apple Silicone devices. The library takes care of allocating, computing and resizing internal state (large Vec<u128>
s that increase in size over time). I've parallelised some of the library's CPU code to the GPU for parallel processing of these large vectors (100x speedup). The GPU code is invoked using metal-rs - a wrapper of Apple's Metal GPU API.
I'm trying to remove the non-negligible overhead of copying vectors contents into new buffers (before executing the GPU code) and copying it back into the vectors (after executing the GPU code). This overhead can be avoided on the Apple M1 due to its unified CPU and GPU memory architecture. The GPU can compute directly on the memory allocation of a vector.
The Metal API provides a method makeBuffer(bytesNoCopy:length:options:deallocator:) for creating a buffer that "wraps an existing contiguous memory allocation". The memory address this method takes as input "must already be page-aligned". I'm able to pass the raw pointer of my vectors to this method but I'm struggling with keeping the vector's memory page-aligned. There is a similar discussion about aligning the memory of vectors: How do I allocate a Vec that is aligned to the size of the cache line?. Unfortunately this doesn't suffice since I:
- Don't know the page size of the system at compile time so can't use
#[repr(C, align(...))]
- Need the
Vec
to remain aligned even if its memory is re-allocated (i.e. from resizing)
I like using Vec
due to all the functionality it provides but I would also like to know how to always keep its memory allocation page-aligned.