2

What is the correct way to write to a memory mapped file with no synchronization from multiple threads in Rust?

I need to create a 40+ GB file using multiple threads. The file is used as a giant vector of u64 values. Threads do not need any kind of synchronization -- each thread's output will be unique to that thread, but each thread does NOT get its own slice. Rather, the nature of the data ensures each thread will generate a set of unique positions in the file to write to. Simple example -- each thread writes to a position [ind / thread_count], where ind goes to millions. For thread_count = 2, one thread writes to odd positions, and the other to even.

I have used memmap2 - a new maintained fork of the memmap lib. The memmap2 seems to do everything I need for the access, but I do not know how to properly use it from multiple threads.

Yuri Astrakhan
  • 8,808
  • 6
  • 63
  • 97
  • 3
    A lazy way would be for each thread to just create its own mapping of the region that it needs. – Nate Eldredge Oct 31 '21 at 18:43
  • 1
    Does this answer your question? [How do I pass disjoint slices from a vector to different threads?](https://stackoverflow.com/questions/33818141/how-do-i-pass-disjoint-slices-from-a-vector-to-different-threads) – kmdreko Oct 31 '21 at 18:50
  • @kmdreko it doesn't seem so -- each thread will not write to a dedicated slice. Instead, each thread could randomly write pretty much anywhere in the giant file, but I just know that they won't conflict when writing (because of the nature of the data being processed) – Yuri Astrakhan Oct 31 '21 at 19:02
  • @NateEldredge I was actually thinking if its ok for each thread to open its own mutable map of a file? My only concern is that i would have to do my own thread management instead of using `rayon::iter::ParallelBridge`... I am new to Rust, so not sure how to even start with that. – Yuri Astrakhan Oct 31 '21 at 19:10
  • 1
    I would seriously consider transmuting the `&[u8]` slice returned by `Mmap::as_ref()` to `&[AtomicU8]`. Such transmute is unsafe but sound because `AtomicU8` is guaranteed to have the same representation as `u8`. Share the resulting slice (and only that slice) among your threads, and have them store data using `slice[pos].store(value, Ordering::Relaxed)`, which will (on x86) be compiled into an ordinary store. – user4815162342 Oct 31 '21 at 20:20
  • @user4815162342 could you provide a sample of how to do this? I'm a Rust newbie despite lots of other language experience, so would be very helpful to see a working example of that. Thx! – Yuri Astrakhan Oct 31 '21 at 20:24
  • 2
    @YuriAstrakhan I meant [something like this](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ab57740082c9e1c6fc3ba64ffe5561bb). (Using `memmap` as example because the playground supports it, `memmap2` should work the same.) – user4815162342 Oct 31 '21 at 20:32
  • 3
    @user4815162342 Transmuting `&[u8]` into `&[AtomicU8]` is not sound because `AtomicU8` has interior mutability but `u8` does not. Mutating data reached through a shared reference without interior mutability [is undefined behavior](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). To make it sound you would need to use `&mut [u8]` instead of `&[u8]` (which is only a [small change](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=e9547ee6cc73cee6f05280c71abf252a) to your example). – Frxstrem Nov 01 '21 at 04:48
  • @Frxstrem Makes sense, thanks. What do you think of using `from_raw_parts()` in preference to `transmute`? – user4815162342 Nov 01 '21 at 06:18

0 Answers0