6

I am initializing buf unnecessarily with all zeros before immediately writing over its contents with read_exact:

fn parse<R: Read>(r: &mut R) -> Result<()> {
    let mut buf = [0u8; 1024];
    r.read_exact(&mut buf)?;
    // Do things to buf
    Ok(true)
}

While I don't think the 0-initialization is very time-consuming, it doesn't seem necessary. Is there a way around this?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    Check the machine code first. I know C and C++ compilers will remove writes that are overwritten by other writes, if it can see the code. – Zan Lynx Nov 27 '17 at 01:39

2 Answers2

7

Yes, such an initialization is required in safe Rust. The compiler isn't necessarily able to tell that the combination of read_exact and ? will prevent accesses to any uninitialized data from the array. The optimizing passes may make it better, but that cannot be counted on to work.

You could instead read into a Vec, which internally guarantees to never allow access to the uninitialized memory:

fn parse<R: Read>(r: &mut R) -> io::Result<()> {
    let mut buf = vec![]; // Can optionally use `Vec::with_capacity`
    r.take(1024).read_to_end(&mut buf)?;
    // Do things to buf
    Ok(())
}

In unsafe Rust, you can use mem::uninitialized. Make sure you understand all the terrible little nuances involved before choosing such a situation! Reading uninitialized memory is undefined behavior in Rust, so you must make absolutely sure that you prevent it from ever happening.

Here, we rely on:

  1. The fact that read_exact will return an error if it didn't populate all of the bytes.
  2. The fact that the destructor of [u8; 1024] will not read the bytes. This is important path to consider when an error occurs or if a panic is raised.
fn parse<R: Read>(r: &mut R) -> io::Result<()> {
    unsafe {
        let mut buf: [u8; 1024] = mem::uninitialized();
        r.read_exact(&mut buf)?;
        // Do things to buf
    }
    Ok(())
}

I wouldn't think this is would be worth it without heavy profiling.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    By making `buf` uninitialized, wouldn't there be a possibility of buf being freed while still being uninitialized memory (which is UB)? E.g if `read_exact` failed before writing anything into it, either from a panic or returning an error. If so, I don't think it should be included in this answer without a disclaimer. – Timidger Nov 29 '17 at 17:15
  • @Timidger fair point — whenever I see `unsafe` I treat the code with extreme suspicion, but having an explicit warning is always smart. In this case, I do believe it to be safe for the reasons I've added. – Shepmaster Dec 01 '17 at 03:57
3

There is currently no way to do this with read_exact in Rust outside of unsafe blocks. However, you can probably get what you want in this particular case by using std::io::BufReader::with_capacity instead of allocating your own mut [u8].

If you don't care about being able to set the buffer length yourself, you could have your function require the std::io::BufRead trait for its argument instead of Read.

Either of the approaches I mentioned will allow you to use fill_buf and consume instead of read_exact, which should give you the performance you want.

Jesin
  • 1,009
  • 9
  • 12