7

I am looking for a way to directly read the content of a file into the provided uninitialized byte array.

Currently, I have a code like the following:

use std::fs::File;
use std::mem::MaybeUninit;

let buf: MaybeUninit<[u8; 4096]> = MaybeUninit::zeroed();
let f = File::open("some_file")?;
f.read(buf.as_mut_ptr().as_mut().unwrap())?;

The code does work, except that it unnecessarily initializes the byte array with 0. I would like to replace MaybeUninit::zeroed() with MaybeUninit::uninit() but doing so will trigger an undefined behavior according to the document of MaybeUninit. Is there a way to initialize an uninitialized memory region with the content of the file without first reading the data to somewhere else, by only using the standard library? Or do we need to go for the OS-specific API?

Seiichi Uchida
  • 379
  • 3
  • 7
  • I recommend taking a look at [`Read::initializer()`](https://doc.rust-lang.org/nightly/std/io/trait.Read.html#method.initializer). It provides an (unstable) interface for initializing the buffer for a particular `Read` implementation, in case that `Read` implementation requires the buffer to be initialized. (Very few, if any, `Read` implementations need an initialized buffer.) – Sven Marnach Sep 16 '19 at 07:20
  • @SvenMarnach Thanks, I have never heard of it before. I will take a look. – Seiichi Uchida Sep 16 '19 at 15:12

1 Answers1

8

The previous shot at the answer is kept below for posterity. Let's deal with the actual elephant in the room:

Is there a way to initialize an uninitialized memory region with the content of the file without first reading the data to somewhere else, by only using the standard library? Or do we need to go for the OS-specific API?

There is: Read::read_to_end(&mut self, &mut Vec<u8>)

This function will drain your impl Read object, and depending on the underlying implementation will do one or more reads, extending the Vec provided as it goes and appending all bytes to it.

It then returns the number of bytes read. It can also be interrupted, and this error needs to be handled.


You are trying to micro-optimize something based on heuristics you think are the case, when they are not.

The initialization of the array is done in one go as low-level as it can get with memset, all in one chunk. Both calloc and malloc+memset are highly optimized, calloc relies on a trick or two to make it even more performant. Somebody on codereview pitted "highly optimized code" against a naive implementation and lost as a result.

The takeaway is that second-guessing the compiler is typically fraught with issues and, overall, not worth micro-optimizing for unless you can put some real numbers on the issues.

The second takeaway is one of memory logic. As I am sure you are aware, allocation of memory is dramatically faster in some cases depending on the position of the memory you are allocating and the size of the contiguous chunk you are allocating, due to how memory is laid out in atomic units (pages). This is a much more impactful factor, to the point that below the hood, the compiler will often align your memory request to an entire page to avoid having to fragment it, particularly as it gets into L1/L2 caches.

If anything isn't clear, let me know and I'll generate some small benchmarks for you.

Finally, MaybeUninit is not at all the tool you want for the job in any case. The point of MaybeUninit isn't to skip a memset or two, since you will be performing those memsets yourself by having to guarantee (by contract due to assume_init) that those types are sane. There are cases for this, but they're rare.

In larger cases

There is an impact on performance in uninitializing vs. initializing memory, and we're going to show this by taking an absolutely perfect scenario: we're going to make ourselves a 64M buffer in memory and wrap it in a Cursor so we get a Read type. This Read type will have latency far, far inferior to most I/O operations you will encounter in the wild, since it is almost guaranteed to reside entirely in L2 cache during the benchmark cycle (due to its size) or L3 cache (because we're single-threaded). This should allow us to notice the performance loss from memsetting.

We're going to run three versions for each case (the code):

  • One where we define out buffer as [MaybeUninit::uninit().assume_init(); N], i.e. we're taking N chunks of MaybeUninit<u8>
  • One where out MaybeUninit is a contiguous N-element long chunk
  • One where we're just mapping straight into an initialized buffer

The results (on a core i9-9900HK laptop):

large reads/one uninit  time:   [1.6720 us 1.7314 us 1.7848 us]

large reads/small uninit elements
                        time:   [2.1539 us 2.1597 us 2.1656 us]

large reads/safe        time:   [2.0627 us 2.0697 us 2.0771 us]

small reads/one uninit  time:   [4.5579 us 4.5722 us 4.5893 us]

small reads/small uninit elements
                        time:   [5.1050 us 5.1219 us 5.1383 us]

small reads/safe        time:   [7.9654 us 7.9782 us 7.9889 us]

The results are as expected:

  • Allocating N MaybeUninit is slower than one huge chunk; this is completely expected and should not come as a surprise.
  • Small, iterative 4096-byte reads are slower than a huge, single, 128M read even when the buffer only contains 64M
  • There is a small performance loss in reading using initialized memory, of about 30%
  • Opening anything else on the laptop while testing causes a 50%+ increase in benchmarked time

The last point is particularly important, and it becomes even more important when dealing with real I/O as opposed to a buffer in memory. The more layers of cache you have to traverse, the more side-effects you get from other processes impacting your own processing. If you are reading a file, you will typically encounter:

  • The filesystem cache (may or may not be swapped)
  • L3 cache (if on the same core)
  • L2 cache
  • L1 cache

Depending on the level of the cache that produces a cache miss, you're more or less likely to have your performance gain from using uninitialized memory dwarfed by the performance loss in having a cache miss.

So, the (unexpected TL;DR):

  • Small, iterative reads are slower
  • There is a performance gain in using MaybeUninit but it is typically an order of magnitude less than any I/O opt
Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • 1
    I don't think this answer is appropriate here. Having to initialize big buffers before doing I/O can have a significant performance impact. Rust already provides the unstable `Read::initializer()` interface for this use case. While I personally don't have experience with this particular problem, [at least some experts think that it matters](https://github.com/rust-lang/rust/issues/42788#issuecomment-516877440). – Sven Marnach Sep 16 '19 at 07:16
  • @SvenMarnach In a general situation, you'd be right. In this case, where the OP explicitly set his buffer to exactly the size of **one** page, however? – Sébastien Renauld Sep 16 '19 at 07:19
  • I took this as example code for this question, and interpreted the question more generally. Fair enough. – Sven Marnach Sep 16 '19 at 07:22
  • @SvenMarnach I'm editing my answer and providing a memory-optimized sample for much larger reads as a result of your comment, though. Let's make this one unambiguous - thanks for the input :-) – Sébastien Renauld Sep 16 '19 at 07:24
  • 1
    @SvenMarnach Added a ton of info, benchmarks, results and conclusions for the generic case – Sébastien Renauld Sep 16 '19 at 08:31
  • "There is a small performance loss in reading using initialized memory, of about 30%" % is NOT a small penalty ! – Stargateur Sep 16 '19 at 09:58
  • @Stargateur Keep in mind that this is for reading from memory, so for slower devices the penalty is expected to be lower. However, for some high-performance network applications, it can probably come close. – Sven Marnach Sep 16 '19 at 15:21
  • @SébastienRenauld Thanks for the detailed explanation with some cool benchmark results! It's my bad that I too simplified my example code, but my usage is not limited to reading a fixed size of region from the file. I basically need a `mmap` which does not suck :) – Seiichi Uchida Sep 16 '19 at 15:23
  • 1
    @SébastienRenauld Nit: I think that calling `MaybeUnint::unint().assume_init()` is a straight undefined behavior. – Seiichi Uchida Sep 16 '19 at 15:27
  • now **that** is a potential better use. if you wouldn't mind elaborating; I'm about to take a plane and will read and adjust my answer accordingly when I'm on the ground again :-) – Sébastien Renauld Sep 16 '19 at 15:28
  • I may be wrong on that but, due to the types (flat, fixed width array of `u8`) the invariant holds. if we had a `Vec`, or were dealing with `i8` then all bets would be off and we would firmly be in the realm of UB – Sébastien Renauld Sep 16 '19 at 15:36
  • 1
    @SeiichiUchida no, `MaybeUnint::unint().assume_init()` is not instant UB https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#initializing-an-array-element-by-element – Stargateur Sep 16 '19 at 16:43
  • @SeiichiUchida I've addressed the last part of your question: how to actually read an `impl Read` object to its end while using an external buffer. – Sébastien Renauld Sep 16 '19 at 17:14