29

Consider the snippet

struct Foo {
    dummy: [u8; 65536],
}

fn bar(foo: Foo) {
    println!("{:p}", &foo)
}

fn main() {
    let o = Foo { dummy: [42u8; 65536] };
    println!("{:p}", &o);
    bar(o);
}

A typical result of the program is

0x7fffc1239890
0x7fffc1229890

where the addresses are different.

Apparently, the large array dummy has been copied, as expected in the compiler's move implementation. Unfortunately, this can have non-trivial performance impact, as dummy is a very large array. This impact can force people to choose passing argument by reference instead, even when the function actually "consumes" the argument conceptually.

Since Foo does not derive Copy, object o is moved. Since Rust forbids the access of moved object, what is preventing bar to "reuse" the original object o, forcing the compiler to generate a potentially expensive bit-wise copy? Is there a fundamental difficulty, or will we see the compiler someday optimise away this bit-wise copy?

Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
WiSaGaN
  • 46,887
  • 10
  • 54
  • 88
  • 22
    Rustc does optimize moves. It isn't doing so in this case, probably because llvm didn't inline bar. This might even be because you are trying to observe the pointer values, and llvm isn't sure if that's safe to optimize. I tried it without the `:p` prints and used test::black_box instead, and the copy vanishes from the assembly. – Manishearth Jul 25 '16 at 16:37
  • @Manishearth `bar` is getting inlined. LLVM is just bad at removing moves of large arrays. – Veedrac Jul 25 '16 at 22:31
  • The issues with `NRVO` tag are related to this: https://github.com/rust-lang/rust/labels/A-mir-opt-nrvo – WiSaGaN Oct 19 '20 at 05:32
  • Is `o` dropping guaranteed in this case? In view of it was moved out to `bar()`, what's the point where the `o` memory would free up? – Ilya Loskutov Mar 25 '21 at 18:22

1 Answers1

27

Given that in Rust (unlike C or C++) the address of a value is not considered to matter, there is nothing in terms of language that prevents the elision of the copy.

However, today rustc does not optimize anything: all optimizations are delegated to LLVM, and it seems you have hit a limitation of the LLVM optimizer here (it's unclear whether this limitation is due to LLVM being close to C's semantics or is just an omission).

So, there are two avenues of improving code generation for this:

  • teaching LLVM to perform this optimization (if possible)
  • teaching rustc to perform this optimization (optimization passes are coming to rustc now that it has MIR)

but for now you might simply want to avoid such large objects from being allocated on the stack, you can Box it for example.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 6
    Speaking of MIR optimization passes, the first one would be a simple move destination propagation pass: https://github.com/rust-lang/rust/pull/34693. The tracking issue is https://github.com/rust-lang/rust/issues/32966. – eddyb Jul 25 '16 at 17:05
  • 1
    Instead of just avoiding stack allocation, it would be better to assume the move will be optimized and only box things later if it wasn't. Most of the time in Rust you shouldn't be thinking about trying to avoid copying things. – Michael Younkin Jul 26 '16 at 02:48
  • @MichaelYounkin: I partially agree. The problem is that large objects copied a few times on the stack easily lead to stack-overflow, especially with Debug targets where optimizations do not occur. If the buffer is very large, the cost of the dynamic allocation should be dwarfed by the cost of initializing the buffer itself anyway. – Matthieu M. Jul 26 '16 at 10:31
  • 1
    @MatthieuM allocating it on the heap is very well, but in my experience, even writing Box::new(BigStruct::new()) first allocates the BigStruct in the stack (in BigStruct::new), then copies it in the heap (in Box::new). Or am I missing something? – Pierre-Antoine Dec 22 '17 at 07:29
  • @Pierre-Antoine: In Debug, yes, for now; this is why [placement new](https://github.com/rust-lang/rust/issues/27779) is so sought after. In Release, the stack copy should hopefully be optimized out anyway, but this may lead to Stack Overflows in Debug that prevent you from testing your code :( – Matthieu M. Dec 25 '17 at 11:16