6

I have something that is Read; currently it's a File. I want to read a number of bytes from it that is only known at runtime (length prefix in a binary data structure).

So I tried this:

let mut vec = Vec::with_capacity(length);
let count = file.read(vec.as_mut_slice()).unwrap();

but count is zero because vec.as_mut_slice().len() is zero as well.

[0u8;length] of course doesn't work because the size must be known at compile time.

I wanted to do

let mut vec = Vec::with_capacity(length);
let count = file.take(length).read_to_end(vec).unwrap();

but take's receiver parameter is a T and I only have &mut T (and I'm not really sure why it's needed anyway).

I guess I can replace File with BufReader and dance around with fill_buf and consume which sounds complicated enough but I still wonder: Have I overlooked something?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
musiKk
  • 14,751
  • 4
  • 55
  • 82

3 Answers3

5

Like the Iterator adaptors, the IO adaptors take self by value to be as efficient as possible. Also like the Iterator adaptors, a mutable reference to a Read is also a Read.

To solve your problem, you just need Read::by_ref:

use std::io::Read;
use std::fs::File;

fn main() {
    let mut file = File::open("/etc/hosts").unwrap();
    let length = 5;

    let mut vec = Vec::with_capacity(length);
    file.by_ref().take(length as u64).read_to_end(&mut vec).unwrap();

    let mut the_rest = Vec::new();
    file.read_to_end(&mut the_rest).unwrap();
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • I found out I don't even have to call `by_ref()`. Rust seems to do it automatically on a `&File`, it also doesn't seem to be necessary to be mutable. This confuses me greatly! Is `by_ref()` some sort of magic method? – musiKk Jul 19 '15 at 16:24
  • @musiKk No, it is not magic, it's actually extremely simple, [its implementation](https://github.com/rust-lang/rust/blob/cb87ea80a6f3411c889f9a48b65724b32659c171/src/libstd/io/mod.rs#L352-L353) is one line and the body is one keyword. The fact that a reference works seems... incorrect to me. I might have to ask my own question! – Shepmaster Jul 19 '15 at 16:34
  • Don't sweat it, [I'm done with one of my own](http://stackoverflow.com/questions/31503429/why-can-i-call-file-take-on-a-reference). ;) – musiKk Jul 19 '15 at 16:34
2

1. Fill-this-vector version

Your first solution is close to work. You identified the problem but did not try to solve it! The problem is that whatever the capacity of the vector, it is still empty (vec.len() == 0). Instead, you could actually fill it with empty elements, such as:

let mut vec = vec![0u8; length];

The following full code works:

#![feature(convert)] // needed for `as_mut_slice()` as of 2015-07-19

use std::fs::File;
use std::io::Read;

fn main() {
    let mut file = File::open("/usr/share/dict/words").unwrap();
    let length: usize = 100;
    let mut vec = vec![0u8; length];
    let count = file.read(vec.as_mut_slice()).unwrap();
    println!("read {} bytes.", count);
    println!("vec = {:?}", vec);
}

Of course, you still have to check whether count == length, and read more data into the buffer if that's not the case.


2. Iterator version

Your second solution is better because you won't have to check how many bytes have been read, and you won't have to re-read in case count != length. You need to use the bytes() function on the Read trait (implemented by File). This transform the file into a stream (i.e an iterator). Because errors can still happen, you don't get an Iterator<Item=u8> but an Iterator<Item=Result<u8, R::Err>>. Hence you need to deal with failures explicitly within the iterator. We're going to use unwrap() here for simplicity:

use std::fs::File;
use std::io::Read;

fn main() {
    let file = File::open("/usr/share/dict/words").unwrap();
    let length: usize = 100;
    let vec: Vec<u8> = file
        .bytes()
        .take(length)
        .map(|r: Result<u8, _>| r.unwrap()) // or deal explicitly with failure!
        .collect();
    println!("vec = {:?}", vec);
}
mdup
  • 7,889
  • 3
  • 32
  • 34
  • I have thought about something like that but isn't that rather inefficient? It looks like allocation of the `Vec` changes from O(1)ish to O(n). – musiKk Jul 19 '15 at 14:24
  • Sure, but you're going O(n) anyway when filling the vector, so it's a matter of `2n` vs. `n`, which is not *that* bad. I would be more worried about the check `count == length` which is avoided in the 2nd version (see edit). – mdup Jul 19 '15 at 14:32
  • I guess the question boiled down to "how do I create a `Vec` with a particular length`. When I think about it, higher-level languages that shield the developer from this probably do this behind the scenes, too. Otherwise I'm aware of error handling and checking the number of bytes actually read. I just left it out for brevity. Thanks. :) – musiKk Jul 19 '15 at 14:36
  • Your solution with `bytes()` actually has the same problem as my `take()`: It consumes the `file` but I only have a borrowed mutable reference to the file so this doesn't work. - Ok, I have no idea what's going on. It does work with a reference to the file but not if I have a reference to a struct that contains the file. – musiKk Jul 19 '15 at 14:39
  • @musiKk *"It looks like allocation of the Vec"* looks can be deceiving, and Rust / LLVM do an amazing job of optimizing iterators. You need to look at assembly to actually know how much is allocated. I'll point to [`Iterator::size_hint`](http://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint) as a concrete example of how it can be more efficient as well. – Shepmaster Jul 19 '15 at 16:22
  • 1
    Please use `vec![0u8; length]` to build the vector, it's shorter, performs better, and should be the idiom. – bluss Aug 14 '15 at 21:35
1

You can always use a bit of unsafe to create a vector of uninitialized memory. It is perfectly safe to do with primitive types:

let mut v: Vec<u8> = Vec::with_capacity(length);
unsafe { v.set_len(length); }
let count = file.read(vec.as_mut_slice()).unwrap();

This way, vec.len() will be set to its capacity, and all bytes in it will be uninitialized (likely zeros, but possibly some garbage). This way you can avoid zeroing the memory, which is pretty safe for primitive types.

Note that read() method on Read is not guaranteed to fill the whole slice. It is possible for it to return with number of bytes less than the slice length. There are several RFCs on adding methods to fill this gap, for example, this one.

Vladimir Matveev
  • 120,085
  • 34
  • 287
  • 296
  • This is very interesting. For now I went with `file.take(length).read_to_end(vec)`. I had a problem with a move out of a borrowed struct that I now managed to fix. That's why I dismissed this solution in my question. – musiKk Jul 19 '15 at 15:48