Does partial application in Rust have overhead?

Question

I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.

An example of partial application:

fn add(x: i32, y: i32) -> i32 {
    x + y
}

fn main() {
    let add7 = |x| add(7, x);

    println!("{}", add7(35));
}

Is there overhead to this practice?

Here is the kind of thing I like to do (from a real code):

fn foo(n: u32, things: Vec<Things>) {
    let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
    let new_things = things.clone().into_iter().flat_map(create_new_multiplier);

    things.extend(new_things);
}

This is purely visual. I do not like to imbricate too much the stuff.

In general, LLVM does a very good job at removing overhead of abstractions like that. But, if you aren't sure, usually the best thing to do is run some benchmarks! — Peter Hall, Oct 03 '17 at 16:22
As far as the latest compiler can go, that specific example would compile to `println!("{}", 42)`. And if you `black_box` the input to `add7`, you get `println!("{}", 35 + 7)`. You might need a more complex example in order to find a place where overhead does happen. :) — E_net4, Oct 03 '17 at 16:30
@E_net4 Please do not take this example literally. It was just a simplistic example of partial application. — Boiethios, Oct 03 '17 at 16:32
Sure sure. On the other hand, we would like to know whether you have a less trivial use of partial application, and of course, whether you're thinking of an alternate implementation or a baseline (as in, something with the same capabilities but without closures for partial application). — E_net4, Oct 03 '17 at 16:37
Your second example does nothing and gets optimized out by dead code elimination. Ultimately all optimizations (except specialization on the library level) depend on how far the compiler can see and how much it chooses to inline. Presence of trait objects (dynamic dispatch) can also hinder some optimizations. — the8472, Oct 03 '17 at 16:53

Shepmaster · Accepted Answer · 2017-10-03T17:32:01.767

There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.

In code:

let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)

will be the exact same as

things.clone().into_iter().flat_map(|thing| { 
    ThingMultiplier::new(thing, n)
})

In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.

The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.

score 5 · Answer 2 · answered Oct 03 '17 at 17:07

In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.

But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.

Does partial application in Rust have overhead?

2 Answers2