3

Motivation

I want to read a stream of values for multiple files on disc. These might be CSV files, or tab-separated, or some proprietary binary format. Therefore I want my function that handles reading multiple files to take the Path -> Iterator<Data> function as an argument. If I understand correctly, in Rust I need to box the iterator, and the function itself, since they're unsized. Therefore my reading function should be (I'm just using i32 as a simple proxy for my data here):

fn foo(read_from_file: Box<dyn Fn(&Path) -> Box<dyn Iterator<Item=i32>>>) {
    panic!("Not implemented");
}

For testing, I'd rather not be reading actual files from disc. I'd like my test data to be right there in the test module. Here's roughly what I want, but I've just put it into the main of a bin project for simplicity:

use std::path::Path;

fn foo(read_from_file: Box<dyn Fn(&Path) -> Box<dyn Iterator<Item=i32>>>) {
    panic!("Not implemented");
}

fn main() {

    let read_from_file = Box::new(|path: &Path| Box::new(match path.as_os_str().to_str().unwrap() {
        "/my_files/data.csv" => vec![1, 2, 3],
        "/my_files/data_2.csv" => vec![4, 5, 6],
        _ => panic!("Invalid filename"),
    }.into_iter()));

    foo(read_from_file);
}

The error

This gives me a compilation error:

   Compiling iter v0.1.0 (/home/harry/coding/rust_sandbox/iter)
error[E0271]: type mismatch resolving `for<'r> <[closure@src/main.rs:9:35: 13:19] as FnOnce<(&'r Path,)>>::Output == Box<(dyn Iterator<Item = i32> + 'static)>`
  --> src/main.rs:15:9
   |
15 |     foo(read_from_file);
   |         ^^^^^^^^^^^^^^ expected trait object `dyn Iterator`, found struct `std::vec::IntoIter`
   |
   = note: expected struct `Box<(dyn Iterator<Item = i32> + 'static)>`
              found struct `Box<std::vec::IntoIter<{integer}>>`
   = note: required for the cast to the object type `dyn for<'r> Fn(&'r Path) -> Box<(dyn Iterator<Item = i32> + 'static)>`

For more information about this error, try `rustc --explain E0271`.
error: could not compile `iter` due to previous error

I don't really understand this. Doesn't std::vec::IntoIter implement Iterator, in which case I don't see why this is a type error?

The fix, which I also don't understand

If I add an explicit type annotation Box<dyn Fn(&Path) -> Box<dyn Iterator<Item=i32>>>, this compiles:

use std::path::Path;

fn foo(read_from_file: Box<dyn Fn(&Path) -> Box<dyn Iterator<Item=i32>>>) {
    panic!("Not implemented");
}

fn main() {

    let read_from_file : Box<dyn Fn(&Path) -> Box<dyn Iterator<Item=i32>>>
        = Box::new(|path: &Path| Box::new(match path.as_os_str().to_str().unwrap() {
        "/my_files/data.csv" => vec![1, 2, 3],
        "/my_files/data_2.csv" => vec![4, 5, 6],
        _ => panic!("Invalid filename"),
    }.into_iter()));

    foo(read_from_file);

I'm very confused by why this works. My understanding of Rust is that, in a let definition, the explicit type is optional - unless the compiler cannot infer it, in which case the compiler should emit error[E0283]: type annotations required.

Harry Braviner
  • 627
  • 4
  • 12
  • 1
    Another option is to add a return type annotation on just the inner closure: `Box::new(|path: &Path| -> Box> { Box::new(... ) })` – PitaJ Nov 08 '21 at 17:47
  • 3
    I'm sure there's a duplicate somewhere that I can't find, but `Box` and `Box` are fundamentally different things in Rust and have different memory layouts. If you just box some instance of `Iterator`, it will **not** be a `Box`. You have to cast it to one. – Aplet123 Nov 08 '21 at 18:09

2 Answers2

3

Pointers to dynamically sized types (DSTs) like Box<dyn Iterator<Item=i32>> are "fat". A Box<std::vec::IntoIter<i32>> is not a pointer to a DST (as the size of IntoIter is known), and hence can be a "thin" pointer simply pointing to the instance of IntoIter on the heap.

The creation and usage of a fat pointer is more expensive than that of a thin pointer. This is why, as @Aplet123 mentioned, you need to explicitly tell the compiler somehow (via type annotations or an as cast) that you want to cast the thin Box<std::vec::IntoIter<i32>> pointer generated by your closure to a fat Box<dyn Iterator<Item=i32>> pointer.

Note that if you remove the let binding and create the closure in the argument list of the foo function call, then the compiler makes the closure must return a fat pointer because of the argument type expected by foo.

EvilTak
  • 7,091
  • 27
  • 36
  • This is also really helpful. Though I still don't really see why the compiler is able to correctly do the type inference if the closure is created in the argument list, but not if it is created just above. – Harry Braviner Nov 08 '21 at 20:12
  • Except for integer literals, the compiler must know what type a `let` binding is at declaration. If the closure is declared without annotations, the compiler assumes that the return type is `Box` for the reasons in the answer -- it cannot use future usages to determine the `let` binding's type, rather the future usages must agree with the binding's type at declaration. However, if the closure is created in the argument list, the compiler knows that the function expects a closure that returns a `Box`, and is able to perform the appropriate cast automatically. – EvilTak Nov 08 '21 at 21:59
  • 1
    A slight clarification: the compiler can infer types based on information that comes later on (e.g. `let mut x = vec![]; x.push("hello");` works, but it can't do that for the types of *closure parameters* except for numeric types (floats work too) – cameron1024 Nov 09 '21 at 07:44
1

To me, this reads like a failure of type inference, since the closure is unable to infer that it needs to return a pointer to a v-table (from dyn Iterator).

However, I'd suggest that Box<dyn Foo> might not be necessary here. It's true that, since Iterator is a trait, you can't know the size of it at compile-time, in a sense, you can.

Rust "monomorphizes" generic code, which means it generates copies of generic functions/structs/etc for each concrete type it is used with. For example, if you have:

struct Foo<T> {
  value: T
}

fn main() {
  let _ = Foo { value: "hello" };
  let _ = Foo { value: 123 };
}

It's going to generate a Foo_str_'static and a Foo_i32 (roughly speaking) and substitute those in as needed.

You can exploit this to use static dispatch with generics while using traits. Your function can be rewritten as:

fn foo<F, I>(read_from_file: F)
where
  F: Fn(&Path) -> I,
  I: Iterator<Item = i32>,
{
  unimplemented!()
}

fn main() {
  // note the lack of boxing
  let read_from_file = |path: &Path| {
    // ...
  };

  foo(read_from_file);
}

This code is (very probably but I haven't benchmarked) faster, and more idiomatic, as well as making the compiler error go away.

cameron1024
  • 9,083
  • 2
  • 16
  • 36
  • This makes sense. But I now have two types (`F` and `I`) that the function is generic over. In practice, `foo` is going to be the `new` method of a struct, and if I follow this approach it looks like I end up with my struct having to be generic over `F` and `I`. This makes for more boilerplate if I want to write the type of the struct. Is there a way around that? – Harry Braviner Nov 08 '21 at 20:10
  • In general, you can rely on type inference to pick up most of the slack. In the example I gave, although `foo` has the type parameters, we don't actually have to specify it, the compiler can infer it from context. In my experience, this is the "default", and it's only in special cases where explicit type parameters are necessary – cameron1024 Nov 08 '21 at 20:59
  • Ok, so I made this change in my actual code and it does indeed work. Am I strictly losing some flexibility by doing this, in that the type `F` must be known at compile time? But in return I'm probably gaining some performance, assuming `F` is called a lot, by no longer having the fat pointer? – Harry Braviner Nov 08 '21 at 22:06
  • Note that the use of the generic type parameter `I` for the return type will mean that the closure must always return the same _type_ of `Iterator` -- see [this chapter](https://doc.rust-lang.org/book/ch17-02-trait-objects.html) in the Rust book for more information. You are correct in that generics makes you lose flexibility but gain run time performance as both `F` and `I` must be known at compile time. – EvilTak Nov 08 '21 at 23:07
  • 1
    At runtime, both `F` and `T` must be known in advance. EvilTak's point is valid, you'd lose the ability to choose 2 different types of iterators at runtime (unless you created some kind of `WrappingIterator` that wraps a `Box>` or something similar. It all comes down to your needs though. If you find you're running into this issue frequently, and you don't *need* the performance, maybe moving towards a box-y solution is better. My experience is that "idiomatic" rust code tends to rely more heavily on generics than `Box`, but that's just my experience – cameron1024 Nov 09 '21 at 06:01
  • It's worth adding that `Iterator`s in general are places where the difference between static and dynamic dispatch can be more significant, since they are often called in relatively tight loops. But as with any performance question, I'd advise writing idiomatic code first, and refactoring if benchmarks show it's too slow – cameron1024 Nov 09 '21 at 06:03