1

Coming to Rust from dynamic languages like Python, I'm not used to the programming pattern where you provide a function with a mutable reference to an empty data structure and that function populates it. A typical example is reading a file into a String:

let mut f = File::open("file.txt").unwrap();
let mut contents = String::new();
f.read_to_string(&mut contents).unwrap();

To my Python-accustomed eyes, an API where you just create an owned value within the function and move it out as a return value looks much more intuitive / ergonomic / what have you:

let mut f = File::open("file.txt").unwrap();
let contents = f.read_to_string().unwrap();

Since the Rust standard library takes the former road, I figure there must be a reason for that.

Is it always preferable to use the reference pattern? If so, why? (Performance reasons? What specifically?) If not, how do I spot the cases where it might be beneficial? Is it mostly useful when I want to return another value in addition to populating the result data structure (as in the first example above, where .read_to_string() returns the number of bytes read)? Why not use a tuple? Is it simply a matter of personal preference?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
dlukes
  • 1,313
  • 16
  • 27
  • 3
    Rust is designed for system programming. This function gets a buffer to write in, and returns the written size. This is a very common design in such languages (think about C's `sprintf`, for example). This design is only useful in case of buffers IMO. – Boiethios Aug 18 '17 at 12:35
  • Related: [Why does Rust have both call by value and call by reference?](https://stackoverflow.com/questions/36562262/why-does-rust-have-both-call-by-value-and-call-by-reference). – ljedrz Aug 18 '17 at 12:44

1 Answers1

8

If read_to_string wanted to return an owned String, this means it would have to heap allocate a new String every time it was called. Also, because Read implementations don't always know how much data there is to be read, it would probably have to incrementally re-allocate the work-in-progress String multiple times. This also means every temporary String has to go back to the allocator to be destroyed.

This is wasteful. Rust is a system programming language. System programming languages abhor waste.

Instead, the caller is responsible for allocating and providing the buffer. If you only call read_to_string once, nothing changes. If you call it more than once, however, you can re-use the same buffer multiple times without the constant allocate/resize/deallocate cycle. Although it doesn't apply in this specific case, similar interfaces can be design to also support stack buffers, meaning in some cases you can avoid heap activity entirely.

Having the caller pass the buffer in is strictly more flexible than the alternative.

DK.
  • 55,277
  • 5
  • 189
  • 162
  • So to sum up, the typical use for this pattern would be in functions where it's unclear at compile time how much memory needs to be allocated to hold the result and/or it might be a good idea to reuse previously allocated memory, and therefore these decisions are left to the caller? Is that a fair description? (In other words, [this comment](https://stackoverflow.com/questions/45756582/functions-in-rust-populating-a-reference-vs-moving-an-owned-value?noredirect=1#comment78470827_45756582) would be right in saying that it's primarily useful for buffers.) – dlukes Aug 18 '17 at 13:22
  • @dlukes Broadly, yes. – DK. Aug 18 '17 at 15:20
  • 1
    @dlukes Regardless of whether you know at compile time how much memory is needed, if you already have a buffer and you want to re-use it, there is no way to use a function that returns an owned buffer. This might come up, for instance, when writing a program like `cat` that reads a file line-by-line and discards each line before reading the next. There are lots of situations where you have a buffer "in hand" and it's more efficient to reuse it than trash it and make a new one. – trent Aug 18 '17 at 15:23
  • Why not just create the buffer, pass it by giving the `read_to_string` function its ownership, then have that function return the filled buffer back? – MaiaVictor Apr 25 '18 at 20:38
  • @MaiaVictor: That doesn't work if *your* code is given a borrow from somewhere else. The buffer might be stack-allocated, or kept in a structure somewhere else. A mutable borrow is the most flexible choice available. – DK. Apr 27 '18 at 07:10