17

I thought that once an object is moved, the memory occupied by it on the stack can be reused for other purpose. However, the minimal example below shows the opposite.

#[inline(never)]
fn consume_string(s: String) {
    drop(s);
}

fn main() {
    println!(
        "String occupies {} bytes on the stack.",
        std::mem::size_of::<String>()
    );

    let s = String::from("hello");
    println!("s at {:p}", &s);
    consume_string(s);

    let r = String::from("world");
    println!("r at {:p}", &r);
    consume_string(r);
}

After compiling the code with --release flag, it gives the following output on my computer.

String occupies 24 bytes on the stack.
s at 0x7ffee3b011b0
r at 0x7ffee3b011c8

It is pretty clear that even if s is moved, r does not reuse the 24-byte chunk on the stack that originally belonged to s. I suppose that reusing the stack memory of a moved object is safe, but why does the Rust compiler not do it? Am I missing any corner case?

Update: If I enclose s by curly brackets, r can reuse the 24-byte chunk on the stack.

#[inline(never)]
fn consume_string(s: String) {
    drop(s);
}

fn main() {
    println!(
        "String occupies {} bytes on the stack.",
        std::mem::size_of::<String>()
    );

    {
        let s = String::from("hello");
        println!("s at {:p}", &s);
        consume_string(s);
    }

    let r = String::from("world");
    println!("r at {:p}", &r);
    consume_string(r);
}

The code above gives the output below.

String occupies 24 bytes on the stack.
s at 0x7ffee2ca31f8
r at 0x7ffee2ca31f8

I thought that the curly brackets should not make any difference, because the lifetime of s ends after calling comsume_string(s) and its drop handler is called within comsume_string(). Why does adding the curly brackets enable the optimization?

The version of the Rust compiler I am using is given below.

rustc 1.54.0-nightly (5c0292654 2021-05-11)
binary: rustc
commit-hash: 5c029265465301fe9cb3960ce2a5da6c99b8dcf2
commit-date: 2021-05-11
host: x86_64-apple-darwin
release: 1.54.0-nightly
LLVM version: 12.0.1

Update 2: I would like to clarify my focus of this question. I want to know the proposed "stack reuse optimization" lies in which category.

  1. This is an invalid optimization. Under certain cases the compiled code may fail if we perform the "optimization".
  2. This is a valid optimization, but the compiler (including both rustc frontend and llvm) is not capable of performing it.
  3. This is a valid optimization, but is temporarily turned off, like this.
  4. This is a valid optimization, but is missed. It will be added in the future.
Zhiyao
  • 4,152
  • 2
  • 12
  • 21
  • cause you use the address of it – Stargateur May 12 '21 at 09:13
  • @Stargateur you're pretty much always going to use the address of it tho e.g. calling `len()` (without even using its result) or `clear()` is sufficient for LLVM to decide each string needs its own stack slot: https://godbolt.org/z/5bqPK7E7q (you can add / remove occurrences and see the stack frame size vary, though it only changes by either 16 or 32 bytes, I guess for alignment reasons). – Masklinn May 12 '21 at 09:48
  • Does this answer your question? [Moved object doesn't have same address](https://stackoverflow.com/questions/60108691/moved-object-doesnt-have-same-address) – E_net4 May 12 '21 at 09:49
  • @E_net4thejanitor Thank you, but it doesn't. Just as the answer there points out: "In an optimized build the compiler might well realize that [...] can reuse the space [...]". – Zhiyao May 12 '21 at 10:19
  • @Stargateur Could you further explain why using the address of them prevents the optimization? What can go wrong if the optimization is applied? – Zhiyao May 12 '21 at 10:21
  • I guess there is a good reason, BTW, your link prove nothing, you disallow a lot of optimization with inline never – Stargateur May 12 '21 at 10:27
  • 3
    As stated in the _rest_ of the answer in the linked question, it is up to LLVM to choose whether to reuse the address space for different objects in memory, and observing addresses to values in a stack can influence the compilation output. The Rust compiler itself does not impose one behavior or the other. – E_net4 May 12 '21 at 10:29
  • And so at the end of the day this is mostly an issue to check for in / report to LLVM, unless it occurs specifically because MIR yields "bad" LLVM-IR (I really have no idea). Comparing to similar C++ code could be informative there, but I don't know how you'd ensure a `std::string` is "dead", as moved-from values are always valid. – Masklinn May 12 '21 at 10:45
  • I have voted "Needs focus". I was tempted to dupehammer, but I think this *could* be a good question if it were narrowed to focus only on one aspect of code generation. As Stargateur and E_net4 pointed out already, printing the address *does* inhibit LLVM from optimizing the stack, so to see what it really does behind the scenes you need to remove those `println!`s and either use a debugger or inspect the assembly code. – trent May 12 '21 at 11:01
  • Assuming that optimization isn't what you want to focus on, you could focus the example on the effect of curly brackets, which does strike me as a potentially interesting line of inquiry -- I'm not sure how those are treated differently (than just consuming `s`) within rustc or LLVM. – trent May 12 '21 at 11:04
  • 1
    @trentcl I'll push back again on the idea that printing should drop the optimisation: taking references is absolutely ubiquitous in rust, most method calls will do that. If that's sufficient to deopt then we have a problem (if not a big one). Though Emoun's investigation below seems to hint that the issue might be elsewhere. – Masklinn May 12 '21 at 12:04
  • I didn't say "cause you use the reference of it" but "cause you use the address of it" that a big difference. You ask for the raw address, this imply guarantee added to your variable. – Stargateur May 12 '21 at 12:09
  • 1
    @Masklinn You are mistaken. Taking a reference by itself does not inhibit optimizations because the address of the object does not have any observable effect on the behavior of the code. Printing or otherwise observing the address of an object directly *does* inhibit optimizations because the optimizer must use non-local reasoning to conclude that "any" value is allowed to be printed. – trent May 12 '21 at 12:20
  • In any case, I don't disagree that LLVM *could* do optimizations regardless, but the fact remains that it *doesn't* -- which undermines the premise of the question, since there's in fact no evidence in the question that the stack optimization *wouldn't* be done if the addresses weren't being directly observed. – trent May 12 '21 at 12:23
  • @trentcl Could you elaborate on how printing out the address is different? I thought that from the compiler's perspective, printing the address out is merely taking the reference of it and passing it to some function that does some output based on the binary value. – Zhiyao May 12 '21 at 12:39
  • We can compare [observing vs. not observing the address](https://godbolt.org/z/93Moj7184) by replacing `println!` with a black box. The not-observing code uses less than half the stack space and does at a cursory glance appear to reuse the same 3 locations for the two `String`s. However, using a black box *also* inhibits optimization, so if you were to try to compare that to another blackboxed version that uses `s` by reference -- say one which indexes the string -- you'd get as bad or worse codegen. But that doesn't mean that "real-world" code wouldn't also be optimized. – trent May 12 '21 at 12:51
  • The tl;dr is that *inserting debugging code does affect optimizations*, so you can't use this example as a basis for asking why the optimization is not done unless you *really do* care about the behavior of the debugging code (and not just the code being debugged). – trent May 12 '21 at 12:53
  • @trentcl In the link you provided, in `no_observe()`, `s` and `r` aren't stored on the stack at all. They are optimized away. The colored lines are preparing arguments for `consume_string()`. – Zhiyao May 12 '21 at 13:10
  • Yes... that's what optimizations do? What's your point? – trent May 12 '21 at 13:11
  • @trentcl My point is, when an object *needs* to be stored on the stack, but from some time point it is no longer needed, why that space is not reused thereafter? – Zhiyao May 12 '21 at 13:14
  • And *my* point is, **how do you know that space is not reused?** The example in the question contains debugging code that inhibits stack optimizations. Unless your concern *really is* the debugging code itself, you have not provided an example of a program where (1) an object *needs* to be stored on the stack, (2) the space *can* be reused without altering the observable behavior of the program, but (3) the space is nevertheless *not* reused. Inserting debugging code that explicitly depends on the representation of a `&String` fails criterion (2) which makes it ineligible for optimization. – trent May 12 '21 at 13:20
  • 1
    @trentcl I just created another example. https://godbolt.org/z/TEY76Wsjj In this example, (1) `.push_str()` forces the String instances to occupy space on the stack (2) no observable behavior is altered because nothing is observable (3) the space is not reused after `s` ends its lifetime – Zhiyao May 12 '21 at 13:50

2 Answers2

7

My TLDR conclusion: A missed optimization opportunity.

So the first thing I did was look into whether your consume_string function actually makes a difference. To do this I created the following (a bit more) minimal example:

struct Obj([u8; 8]);
fn main()
{
    println!(
        "Obj occupies {} bytes on the stack.",
        std::mem::size_of::<Obj>()
    );

    let s = Obj([1,2,3,4,5,6,7,8]);
    println!("{:p}", &s);
    std::mem::drop(s);
    
    let r = Obj([11,12,13,14,15,16,17,18]);
    println!("{:p}", &r);
    std::mem::drop(r);
}

Instead of consume_string I use std::mem::drop which is dedicated to simply consuming an object. This code behaves just like yours:

Obj occupies 8 bytes on the stack.
0x7ffe81a43fa0
0x7ffe81a43fa8

Removing the drop doesn't affect the result.

So the question is then why rustc doesn't notice that s is dead before r goes live. As your second example shows, enclosing s in a scope will allow the optimization.

Why does this work? Because the Rust semantics dictate that an object is dropped at the end of its scope. Since s is in an inner scope, it is dropped before the scope exits. Without the scope, s is alive until the main function exits.

Why doesn't it work when moving s into a function, where it should be dropped on exit? Probably because rust doesn't correctly flag the memory location used by s as free after the function call. As has been mentioned in the comments, it is LLVM that actually handles this optimization (called 'Stack Coloring' as far as I can tell), which means rustc must correctly tell it when the memory is no longer in use. Clearly, from your last example, rustc does it on scope exit, but apparently not when an object is moved.

Emoun
  • 2,297
  • 1
  • 13
  • 20
  • I wouldn't call it obvious optimization, it's more to you want to use as little stack as possible versus not. The code run the same speed anyway. – Stargateur May 12 '21 at 11:37
  • 5
    @Stargateur If you take cache into account, smaller memory footprint gives better locality and less cache miss, so the code will run faster. Also, on embedded systems, RAM is rare, the optimization can make a difference. – Zhiyao May 12 '21 at 11:48
  • 5
    In this specific case it probably doesn't matter much, using more/less stack is likely affect performance in the real-world so larger functions could benefit from optimization as this. Also, this optimization is [enabled in LLVM for -O0 and above](https://llvm.org/doxygen/TargetPassConfig_8cpp_source.html#l01088), so they have clearly decided it is almost always worth it. – Emoun May 12 '21 at 11:49
  • Also matters even less because the stack use is completely sequential, so even if it takes more cache lines, the result would mostly be evicting cache lines, not cache misses, because the data is not being read "far away" from its writing. – Masklinn May 12 '21 at 11:59
  • 2
    Evicting a cache line might cause other data reads/writes to miss. E.g. the function could evict a cache line used by its caller, which means the caller might later get a miss. – Emoun May 12 '21 at 12:04
  • 3
    I filed https://github.com/rust-lang/rust/issues/85230 about this. – Jeff Muizelaar May 12 '21 at 15:09
0

i think the fn drop do not free the memory of S, just call the fn drop. in first case the s still use the stack memory, rust can not be reused. in second case, because the {} scope, the memory is free. so the stack memory reused

Peace
  • 9
  • 1