7

I have a reader that contains info about a 51*51 grid, where each spot on the grid is represented by an f32. I want to read this data into a vector so that I can easily deal with it:

pub fn from_reader<R: Read + Seek>(reader: &mut R) -> Arena {
    let arena_size = 51 * 51;
    let arena_byte_size = arena_size * size_of::<f32>();
    let mut arena = vec![0.0f32; arena_size];

    unsafe {
        let mut arena_slice =
            std::slice::from_raw_parts_mut(arena.as_mut_ptr() as *mut u8, arena_byte_size);
        let _ = reader.read(&mut arena_slice);
    };
    //...
}

This method is inconvenient and unnecessarily slow as it forces the vector to be initialized with 0 values for all its elements. I originally wanted to simply allocate a buffer, not initialize it, read the data into it then use from_raw_parts to create a vector out of it. However I was informed that this is undefined behavior since for some unfathomable reason, read and read_exact require the caller to initialize the data being passed to them before calling either of them.

Why is this the case? Is there any workaround? Are there any solutions being worked on by the Rust team?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
zee
  • 2,933
  • 2
  • 16
  • 28

1 Answers1

9

Why is this the case?

Because it's valid for an implementer of Read to read the passed-in buffer first. If you passed in uninitialized data and the implementer of Read looked at the buffer, then there would be undefined behavior in purely safe code. Disallowing that, statically, is a large selling point of Rust.

use std::io::{self, Read};

struct Dummy;

impl Read for Dummy {
    fn read(&mut self, buffer: &mut [u8]) -> io::Result<usize> {
        let v: u8 = buffer.iter().sum(); // Reading from the buffer
        buffer[0] = v;
        Ok(1)
    }
}

fn main() {
    let mut data = [0, 1, 2];
    Dummy.read(&mut data).unwrap();
    println!("{:?}", data);
}
  • Why does Read::read not prevent reading from the buffer?

    There isn't a language construct that can be used to impose that restriction. Unlike some other languages, Rust doesn't have "out parameters". Even if it did, I could see an implementer of Read wanting the ability to read the data that it just wrote. For example, a reader that counted the number of newlines that passed though it.

  • Why does Read::read not accept MaybeUninit?

    MaybeUninit didn't exist in Rust 1.0 — it was only stabilized in Rust 1.36. We wanted the ability to read from files in Rust 1.0. Due to Rust's backwards-compatiblity guarantees, the method's signature cannot be changed now.

  • Why is Read::read not unsafe?

    This would have been the main (only?) technique to support uninitialized data, but it would have come at a high cost. unsafe isn't a tool that experienced Rust programmers choose trivially. When we do use it, we generally strive really hard to minimize its scope.

    If Read::read were unsafe, then every implementer would have to think about how to properly meet the unsafe criteria. This is a high burden to place on "simple" adapters.

Is there any workaround? Are there any solutions being worked on by the Rust team?

The unstable Read::initializer method is one proposed solution, but it's likely not the preferred route.

RFC 2930 provides an updated attempt, and discusses much of the backstory and challenges.

See also:


For your specific case, you can probably use Read::take and Read::read_to_end to read all of the bytes you want into an empty (uninitialized!) Vec and then convert a Vec<T> to a Vec<U> without copying the vector. You will need to somehow ensure that you have properly aligned the Vec for f32 as it starts out as only aligned for u8.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • this is a great answer, and mostly satisfactory, but I'd say that its a lacking in regards to explaining the why the situation is what it is, because while all that is said is technically true, it doesn't explain why the rust team chose to set it up that way. In the documentation for read and read_exact it is clearly stated that the rust team actively discourages implementations which contents from the buffer, but still permit it. Under what situation would reading from the buffer be desirable? why would the rust team allow it? It seems to unnecessarily complicate the situation. – zee Dec 03 '21 at 22:06
  • @zee *why would the rust team allow it* — what Rust language technique would you use to disallow it? There isn't one. – Shepmaster Dec 03 '21 at 22:08
  • 1
    It's not desirable, and implementations are not supposed to read from input buffers. The problem is that there's no way to *prevent* it. The library designers don't want correctness to hinge on unknown implementors adhering to the rules. That would be the C/C++ answer: impose unenforceable requirements and declare that if they're not followed it's UB. Safe Rust has a higher standard. – John Kugelman Dec 03 '21 at 22:09
  • @Shepmaster There aren't any language techniques to disallow many things that are guaranteed by the standard, but still, if the standard required that a conforming implementation not read from the buffer than any implementation that would read from it would be nonconforming and unfavorable. I'll set this as the correct answer for now as its obviously the best explanation with the best work arounds suggested, but an understanding of why this is allowed is still desirable. – zee Dec 03 '21 at 22:12
  • @zee I’ll go more into historical detail later this evening – Shepmaster Dec 03 '21 at 22:16
  • @zee *There aren't any language techniques to disallow many things that are guaranteed by the standard* — can you share any that cause memory unsafety when they are misused? – Shepmaster Dec 03 '21 at 22:26
  • @JohnKugelman I'd quibble with "safer if it were unsafe". I'd agree with "more performant", certainly, but `unsafe` code requires that the caller *and* implementer have **read and understand the documentation** — it's not just wrapping `unsafe {}` around it. Involving the documentation leaves the statically enforced world of the compiler, and getting towards the C/C++ example you mention. – Shepmaster Dec 03 '21 at 22:53
  • @Shepmaster I’ll try my best to explain what I mean, but please be patient with me. The way I understood your comment is that it is saying "there is no way to ensure the logic the implementer will use with language rules, therefor we make no guarantees about what they'll do to the data you pass them.". But for example there isn’t a language requirement that insures `Vec::len()` will return `self.len`, what if it returned `self.len-5`? Clearly such an implementation would be non-conforming. – zee Dec 03 '21 at 23:02
  • Or even if we take the `read` function being discussed here, its not like we can guarantee the implementer would write data from `self` to the buffer provided, they could just write data from anywhere they want, but we’d still call such an implementation non-conforming. No one would look at the two examples above and suggest we the rust standard should change its wording to give no guarantees about any function behaving in any specific manner, so why are we dealing with it here? I’m sure there is a valid reason I just don’t get it myself – zee Dec 03 '21 at 23:02
  • to make it a tl;dr, it seems the root of the issue here is that no language techniques exist to enforce the desired logic, and therefor we can't make a guarantee with just the word of the documentation because it would violate what rust stands for. But the very basis of describing function behavior relies on everyone following the word of the standard because there is no way to completely enforce through language techniques. – zee Dec 03 '21 at 23:06
  • 1
    @zee note that I specifically said "cause memory unsafety". `Vec::len` returning the wrong value would be bad, I agree, but if it did so, no memory unsafety could occur. If the implementer of `Read` used some "other" data, that also be wrong, but wouldn't cause memory unsafety. Make sure you check out the ["But how bad are undefined values really?" section](https://rust-lang.github.io/rfcs/2930-read-buf.html#but-how-bad-are-undefined-values-really) in the RFC and [Working With Uninitialized Memory](https://doc.rust-lang.org/nomicon/uninitialized.html) in the Rustonomicon. – Shepmaster Dec 03 '21 at 23:14
  • 1
    @zee I believe I see a possible misunderstanding in this discussion. When Shepmaster speaks of an "implementer", he's not referring to an alternate implementation of Rust, which would indeed be required to correctly implement the language. He's referring to a *user-defined type* that might implement the `Read` trait. Such type is not constrained by conforming to the language in the same way that `Vec::len()` or `File::read()` are. Passing uninitialized slices to an arbitrary implementation of `read()` would enable safe code to exhibit undefined behavior, which Rust doesn't allow. – user4815162342 Dec 04 '21 at 12:27