Is an array required to be initialized if I will read file contents into it?

Question

I am initializing buf unnecessarily with all zeros before immediately writing over its contents with read_exact:

fn parse<R: Read>(r: &mut R) -> Result<()> {
    let mut buf = [0u8; 1024];
    r.read_exact(&mut buf)?;
    // Do things to buf
    Ok(true)
}

While I don't think the 0-initialization is very time-consuming, it doesn't seem necessary. Is there a way around this?

Check the machine code first. I know C and C++ compilers will remove writes that are overwritten by other writes, if it can see the code. — Zan Lynx, Nov 27 '17 at 01:39

Shepmaster · Accepted Answer · 2017-12-01T03:56:45.743

Yes, such an initialization is required in safe Rust. The compiler isn't necessarily able to tell that the combination of read_exact and ? will prevent accesses to any uninitialized data from the array. The optimizing passes may make it better, but that cannot be counted on to work.

You could instead read into a Vec, which internally guarantees to never allow access to the uninitialized memory:

fn parse<R: Read>(r: &mut R) -> io::Result<()> {
    let mut buf = vec![]; // Can optionally use `Vec::with_capacity`
    r.take(1024).read_to_end(&mut buf)?;
    // Do things to buf
    Ok(())
}

In unsafe Rust, you can use mem::uninitialized. Make sure you understand all the terrible little nuances involved before choosing such a situation! Reading uninitialized memory is undefined behavior in Rust, so you must make absolutely sure that you prevent it from ever happening.

Here, we rely on:

The fact that read_exact will return an error if it didn't populate all of the bytes.
The fact that the destructor of [u8; 1024] will not read the bytes. This is important path to consider when an error occurs or if a panic is raised.

fn parse<R: Read>(r: &mut R) -> io::Result<()> {
    unsafe {
        let mut buf: [u8; 1024] = mem::uninitialized();
        r.read_exact(&mut buf)?;
        // Do things to buf
    }
    Ok(())
}

I wouldn't think this is would be worth it without heavy profiling.

By making `buf` uninitialized, wouldn't there be a possibility of buf being freed while still being uninitialized memory (which is UB)? E.g if `read_exact` failed before writing anything into it, either from a panic or returning an error. If so, I don't think it should be included in this answer without a disclaimer. — Timidger, Nov 29 '17 at 17:15
@Timidger fair point — whenever I see `unsafe` I treat the code with extreme suspicion, but having an explicit warning is always smart. In this case, I do believe it to be safe for the reasons I've added. — Shepmaster, Dec 01 '17 at 03:57

Jesin · Answer 2 · 2017-11-27T03:26:03.030

There is currently no way to do this with read_exact in Rust outside of unsafe blocks. However, you can probably get what you want in this particular case by using std::io::BufReader::with_capacity instead of allocating your own mut [u8].

If you don't care about being able to set the buffer length yourself, you could have your function require the std::io::BufRead trait for its argument instead of Read.

Either of the approaches I mentioned will allow you to use fill_buf and consume instead of read_exact, which should give you the performance you want.

Is an array required to be initialized if I will read file contents into it?

2 Answers2

Linked