5

What is the difference between filter(|x| func(x)) and filter(func)? Perhaps a good place to start would be to understand how filter(func) could be written using syntax akin to filter(|x| func(x)). My code looks like this:

fn filter_out_duplicates(vec_of_vecs: Vec<Vec<u8>>) -> Vec<Vec<u8>> {
  vec_of_vecs
     .into_iter()
     .filter(all_unique)
     .collect()
}

pub fn all_unique<T>(iterable: T) -> bool
where
   T: IntoIterator,
   T::Item: Eq + Hash,
{
   let mut unique = HashSet::new();
   iterable.into_iter().all(|x| unique.insert(x))
}
error[E0599]: the method `collect` exists for struct `Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>`, but its trait bounds were not satisfied
  --> src/main.rs:44:56
   |
44 |             vec_of_vecs.into_iter().filter(all_unique).collect()
   |                                                        ^^^^^^^ method cannot be called on `Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>` due to unsatisfied trait bounds
   |
  ::: /.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/adapters/filter.rs:15:1
   |
15 | pub struct Filter<I, P> {
   | ----------------------- doesn't satisfy `_: Iterator`
   |
   = note: the following trait bounds were not satisfied:
           `<fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>} as FnOnce<(&Vec<u8>,)>>::Output = bool`
           which is required by `Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>: Iterator`
           `fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}: FnMut<(&Vec<u8>,)>`
           which is required by `Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>: Iterator`
           `Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>: Iterator`
           which is required by `&mut Filter<std::vec::IntoIter<Vec<u8>>, fn(&Vec<u8>) -> bool {tmp::all_unique::<&Vec<u8>>}>: Iterator`
           which is required by `&mut Filter<std::vec::IntoIter<Vec<Placement>>, fn(&Vec<Placement>) -> bool {all_unique::<&Vec<Placement>>}>: Iterator`

But the code compiles if I use |x| all_unique(x). I know deciphering compiler errors is the recommended way of solving problems in Rust but I find this error pretty impenetrable.

I found a discussion that seemed to commiserate about the error more than explain coercions but I found the chapter on coercions in the Rustonomicon too short to provide understanding.

kmdreko
  • 42,554
  • 6
  • 57
  • 106
financial_physician
  • 1,672
  • 1
  • 14
  • 34
  • Thank you. So now I understand that `filter(|x| func(x))` will implicitly dereference x in my case (it gets transcribed as `filter(|x:&Vec| func(*x))`) but why does `filter(func)` consume the variable while the former does not? How does `filter(func)` move the values in `vec_of_vecs` from behind the reference and consume them? – financial_physician Sep 23 '22 at 07:00
  • (your comments disappeared but I thought they were insightful haha) – financial_physician Sep 23 '22 at 07:01
  • My comments where incorrect, I played around with your example and it seems to have been caused by something else entirely. Sorry for the confusion – mousetail Sep 23 '22 at 07:05
  • 2
    @mousetail See [playgound](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c3f1efb1c7112cc5427d8bd77378a1eb) for what I believe to be the problem. It seems to be related to lifetime – YthanZhang Sep 23 '22 at 07:08
  • @YichyZhang Why would we expect the 2 different syntaxes to change the lifetime? – financial_physician Sep 23 '22 at 07:30
  • Related Q&A: [What's the difference between `.map(f)` and `.map(|x| f(x))`?](/q/67864911/2189130) but the solution there *was* due to type coercion. – kmdreko Sep 24 '22 at 16:29

2 Answers2

6

This case is not related to coercions. This is another case of late-bound vs. early-bound lifetimes.

Rust has two kinds of lifetimes: early-bound and late-bound. The difference boils down to it is decided what lifetime to use.

For late bound lifetimes, you get a Higher-Ranked Trait Bound - something like for<'a> fn(&'a i32). Then, a lifetime is picked only when the function is called.

For early-bound lifetimes, on the other hand, you get fn(&'some_concrete_lifetime i32). The lifetime may be inferred, sometimes omitted, but it's there. And it has to be decided at the time we decide the type for the function pointer/item.

filter() expects a HRTB function, that is, with a late bound lifetime. This is because the desugaring for FnMut(&Self::Item) -> bool, which is the bound in filter(), is for<'a> FnMut(&'a Self::Item) -> bool, or, if you wish, for<'a> FnMut<(&'a Self::Item,), Output = bool>.

Your all_unique(), however, is generic over T: IntoIterator. And if we set T = &'a Vec<u8>, then 'a is early bound. This is because lifetimes from generic parameters are always early bound - essentially, because we can't late-bind generic arguments, as there is no way in Rust to express for<T>, as generic type parameters are monomorphized and so this is generally impossible.

So, if we reveal the elided lifetimes, you want to satisfy the trait bound fn(&'some_lifetime Vec<u8>) -> bool: for<'all_lifetimes> FnMut(&'all_lifetimes Vec<u8>) -> bool, and this bound is false. This is the reason for the error you saw.

If we use a closure, however, we generate a closure that is specific for the type &'lifetime Vec<u8>. Since it is not generic over the type, the lifetime can be late bound.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • Great answer to something I'd been wondering for a while. Is there a performance difference between passing a function and a closure that calls the same function? Presumably it's negligible if there is. – Holloway Sep 23 '22 at 07:54
  • 2
    @Holloway There is, if it is not inlined. But it will probably be, and if not the perf impact is likely to be negligible. – Chayim Friedman Sep 23 '22 at 07:57
  • Thanks for the excellent explanation. My understanding is `.filter(|f| all_unique(f))` defers the monomorphization of `all_unique`. This allows the compiler to first determine the late-bound lifetime of `f`, and only then should it bother resolving the generic (early-bound) lifetimes of the `all_unique` call. – aedm Sep 23 '22 at 08:31
  • @aedm Yes, pretty much that. – Chayim Friedman Sep 23 '22 at 08:37
  • @Holloway I would expect such closures to always be inlined in release builds. Although there is no *guarantee* that such inlining must happen, that kind of optimization is the cornerstone of "zero cost abstractions" and heavily counted on. – user4815162342 Sep 23 '22 at 12:10
0

I'm not 100% sure about what it is happening here. It could even be considered a compiler limitation/bug, depending on your point of view, but I think that this is what happens:

When you write filter(all_unique), being all_unique a generic function, it is resolved as taking a reference to the item it is iterating upon, as per the definition of filter:

fn filter<P>(self, predicate: P) -> Filter<Self, P>ⓘ where
    P: FnMut(&Self::Item) -> bool, 

So you are actually calling all_unique<&Vec<u8>>. You may think that all is ok, because that &Vec<u8> actually implements IntoIterator and the other constraints.

But the issue is with lifetimes. See, when in filter there is the constraint P: FnMut(&Self::Item) -> bool that is actually syntactic sugar for P: for <'a> FnMut(&'a Self::Item) -> bool, that is the function must be able to accept any lifetime, and your function cannot.

But, wait! you may say that your function all_unique<T> certainly can take T: 'a for any lifetime 'a. And that is true, but that is not what is happening here: you are calling filter<P> with P=all_unique::<&'f Vec<u8>> being 'f that particular lifetime. And that lifetime is not any lifetime! Now your all_unique function is tainted with a particular lifetime and it does not satisfy the for <'a> ... thing above.

Surely, you do not actually need that for <a>` here because you are calling the function with the proper lifetime, but the syntax is what it is and it forces the error.

The obvious solution is to write all_unique to take a reference:

pub fn all_unique<T>(iterable: &T) -> bool

That is actually sintactic sugar for:

pub fn all_unique<'a, T>(iterable: &'a T) -> bool

where the universality of 'a (that for <'a> thing) is implicit.

And now calling filter(all_unique) selects the generic all_unique::<Vec<u8>>, that is untainted with lifetimes and can be called with any 'a.

And this is indeed is a syntactic limitation, you can just write:

pub fn all_unique<T> (iterable: T) -> bool { /* ... */ }

pub fn all_unique_ref<'a, T> (iterable: &'a T) -> bool {
    all_unique::<&T>(iterable)
}

And writing filter(all_unique_ref) will work while filter(all_unique) will not.

Your solution of using a lambda expression:

filter(|x| all_unique(x))

is just like that all_unique_ref but anonymous.

TL;DR; The original error is caused because the lifetime of the argument is captured in the type of the generic function instead of in the usage of that function. And that makes filter() unhappy because its argument does not look generic enough.

rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • If I implement `all_unique_ref` that actually breaks the the function because into_iter() wants to consume the `T` and `T` doesn't implement copy – financial_physician Sep 23 '22 at 17:52