Do I have to be concerned about the overhead of `Rc`?

Question

Am I right to assume that the only thing that "slows down" Rcs is that it checks whether to deallocate the object when it drops? Besides that, "how much" is the overhead of dereferencing a Rc, i.e. should I be concerned about it?
Are those two functions almost equally fast? Or is there a notable difference in speed?

fn test_with_box() {
    let b = Box::new(1.0);
    let x = b * 2;
}

fn test_with_rc() {
    let rc = Rc::new(1.0);
    let x = rc * 2;
}

Since the referenced object in test_with_rc() always only has one reference and behaves like a Box in that function (viewed from outside, not internally, of course).

I suspect that Rcs are actually faster than I think.

PS: When talking about "fast" I mean both dereferencing and allocating/deallocating.

As always with performance questions, the only way to know is to profile it in-situ. Micro-benchmarks are not suitable. — Matthieu M., Jul 07 '15 at 09:56
But it's still interesting to know if the overhead is in the range of a few cycles, a few microseconds, a few milliseconds etc :-) . And Rc is apparently more in the "a few cycles" range. — avl_sweden, Apr 22 '18 at 19:35

Chris Morgan · Accepted Answer · 2020-03-19T09:47:57.430

Rc<T> is very, very cheap. It’s not as cheap as T by quite a bit (boxing values is comparatively expensive in micro-optimisation terms), but scarcely less efficient than Box<T>.

It’s just like Box, but with an additional couple of words for the strong and weak reference counts, and the only things that need to touch that are creating an Rc (initialises the values), cloning an Rc (increments the refcount), dropping an Rc (decrements the refcount and runs the destructor if appropriate), and downgrading to/upgrading from Weak (increments one and decrements the other of the two refcounts).

Dereferencing is a simple memory operation just like it is with Box.

I would just like to add that downgrade/upgrade was stabilized in [Version 1.4.0, 2015-10-29](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-140-2015-10-29) — Zenny, Mar 03 '20 at 23:29

score 11 · Answer 2 · edited Jul 07 '15 at 14:39

To answer your question, you can turn to the test crate, which contains a benchmarking solution (this needs a nightly):

#![feature(test)]
extern crate test;

use std::rc::Rc;

fn test_with_box() {
    let b = Box::new(1.0);
    test::black_box(*b * 2.0);
}

fn test_with_rc() {
    let rc = Rc::new(1.0);
    test::black_box(*rc * 2.0);
}

#[bench]
fn bench_box(b: &mut test::Bencher) {
    b.iter(test_with_box)
}

#[bench]
fn bench_rc(b: &mut test::Bencher) {
    b.iter(test_with_rc)
}

Now compiling & running this:

$ rustc --test -O rc.rs && ./rc --bench

running 2 tests
test bench_box ... bench:          22 ns/iter (+/- 0)
test bench_rc  ... bench:          22 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

What happened? The RC apparently was completely compiled out. As it should be, because we haven't cloned it. So changing the respective fn to:

fn test_with_rc() {
    let rc = Rc::new(1.0);
    test::black_box(*rc.clone() * 2.0);
}

We get the following:

running 2 tests
test bench_box ... bench:          23 ns/iter (+/- 1)
test bench_rc  ... bench:          25 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

So, suffice to say, you'll probably need to worry about other things before looking at RC-induced overhead.

*The RC apparently was completely compiled out* I don't think you can claim that it is optimized out. If you look at the [LLVM IR](http://is.gd/AB68e3), you can see that `test_with_box` calls `@je_mallocx(i64 8, i32 0)` and `test_with_rc` calls `@je_mallocx(i64 24, i32 0)`, so the two blocks wouldn't be the same assembly. There's also 4 more instructions for the `Rc` case. — Shepmaster, Jul 07 '15 at 14:45

Do I have to be concerned about the overhead of `Rc`?

2 Answers2

Linked