Yes, you can. You asked about the mutable case, but I'll preface by saying that if the Vec
is read only (e.g. for a reduction) you can safely send an immutable reference to the specific slice you want in each thread. You can do this by simply using something like &my_vec[idx1..idx2]
in a loop.
For the mutable case it's a bit trickier since the borrow tracker is not sophisticated enough to allow non-overlapping borrows of a Vec
. However, there are a number of methods, notably split_at_mut
you can call to get these subslices. By far the easiest is the chunks_mut
iterator documented here. (Note that there is a matching chunks
iterator for the immutable case so you only need to make minor changes when writing either case).
Be aware that the chunks
and chunks_mut
functions take the size of each chunk, not the number of chunks. However, deriving one from the other is fairly straightforward.
I would like to give a few words of caution with the mutable case, however. If you split the data evenly you may get abysmal performance. The reason is that the CPU doesn't work on individual addresses, instead it works on blocks of memory known as cache lines which are 64-bytes long. If multiple threads work on a single cache line, they have to write and read slower memory in order to ensure consistency between threads.
Unfortunately, in safe Rust there's no easy way to determine where on a cache line a Vec
's buffer starts (because the buffer's start may have been allocated in the middle of a CPU cache line), most of the methods I know of to detect this involve twiddling with the lower bytes of the actual pointer address. The easiest way to handle this is to simply add a 64-byte pad of nonsense-data between each chunk you want to use. So, for instance, if you have a Vec
containing 1000 32-bit floats and 10 threads, you simply add 16 floats with a dummy value (since 32-bits = 4-bytes, 16*4=64=1 cache line) between each 100 of your "real" floats and ignore the dummies during computation.
This is known as false sharing, and I encourage you to look up other references to learn other methods of dealing with this.
Note that the 64-byte line size is guaranteed on x86 architectures. If you're compiling for ARM, PowerPC, MIPS, or something else this value can and will vary.