I saw that is possible to use CUDA to write to memory mapped files (reference cuda - Zero-copy memory, memory-mapped file )
I am wonder if it is somehow possible in Pytorch to write a cuda mounted tensor directory to a mem mapped stored on GPU.
The purpose of this is to speed up writing tensors after each training step. Currently,
with torch.no_grad():
numpyMemmap[arrayOfRandomIndexes] = u_embeddings.weight.data.detach().cpu().numpy()
takes 6 seconds. I think it’s because the numpy memory map is stored on CPU. I need something that would write in a fraction of a second since I will be storing the tensors after each training step, and there will be hundreds of thousands of training steps.