2

I'm trying to write a function that will encapsulate a series of chained iterator method calls (.lines().map(...).filter(...)) which I currently have duplicated. I can't figure out the type signatures to get this to compile. If this is impossible or highly unidiomatic for Rust, I'm open to suggestions for an idiomatic approach.

use std::fs;
use std::io;
use std::io::prelude::*;
use std::iter;

const WORDS_PATH: &str = "/usr/share/dict/words";

fn is_short(word: &String) -> bool {
    word.len() < 7
}

fn unwrap(result: Result<String, io::Error>) -> String {
    result.unwrap()
}

fn main_works_but_code_dupe() {
    let file = fs::File::open(WORDS_PATH).unwrap();
    let reader = io::BufReader::new(&file);
    let count = reader.lines().map(unwrap).filter(is_short).count();
    println!("{:?}", count);

    let mut reader = io::BufReader::new(&file);
    reader.seek(io::SeekFrom::Start(0));
    let sample_size = (0.05 * count as f32) as usize; // 5% sample

    // This chain of iterator logic is duplicated
    for line in reader.lines().map(unwrap).filter(is_short).take(sample_size) {
        println!("{}", line);
    }
}

fn short_lines<'a, T>
    (reader: &'a T)
     -> iter::Filter<std::iter::Map<std::io::Lines<T>, &FnMut(&str, bool)>, &FnMut(&str, bool)>
    where T: io::BufRead
{
    reader.lines().map(unwrap).filter(is_short)
}

fn main_dry() {
    let file = fs::File::open(WORDS_PATH).unwrap();
    let reader = io::BufReader::new(&file);
    let count = short_lines(reader).count();
    println!("{:?}", count);

    // Would like to do this instead:
    let mut reader = io::BufReader::new(&file);
    reader.seek(io::SeekFrom::Start(0));
    let sample_size = (0.05 * count as f32) as usize; // 5% sample
    for line in short_lines(reader).take(sample_size) {
        println!("{}", line);
    }
}


fn main() {
    main_works_but_code_dupe();
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
  • 1
    A workaround is to use type aliases for the chained iterator, I think. I don't think returning abstract types (which in this case would be returning an iterator from `short_lines` is possible yet in stable rust. Check out the discussion on `impl Trait`for this (here's the tracking issue: https://github.com/rust-lang/rust/issues/34511). You can already do some of this using the nightly toolchain – bow Nov 09 '17 at 14:58

2 Answers2

5

I can't figure out the type signatures to get this to compile.

The compiler told you what it was.

error[E0308]: mismatched types
  --> src/main.rs:35:5
   |
35 |     reader.lines().map(unwrap).filter(is_short)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found fn item
   |
   = note: expected type `std::iter::Filter<std::iter::Map<_, &'a for<'r> std::ops::FnMut(&'r str, bool) + 'a>, &'a for<'r> std::ops::FnMut(&'r str, bool) + 'a>`
              found type `std::iter::Filter<std::iter::Map<_, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String {unwrap}>, for<'r> fn(&'r std::string::String) -> bool {is_short}>`

Now, granted, you can't just copy+paste this directly. You have to replace the _ type with the actual one you already had (it left it out because it was already correct). Secondly, you need to delete the {unwrap} and {is_short} bits; those are because function items have unique types, and that's how the compiler annotates them. Sadly, you can't actually write these types out.

Recompile and...

error[E0308]: mismatched types
  --> src/main.rs:35:5
   |
32 |      -> std::iter::Filter<std::iter::Map<std::io::Lines<T>, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String>, for<'r> fn(&'r std::string::String) -> bool>
   |         -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- expected `std::iter::Filter<std::iter::Map<std::io::Lines<T>, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String>, for<'r> fn(&'r std::string::String) -> bool>` because of return type
...
35 |     reader.lines().map(unwrap).filter(is_short)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected fn pointer, found fn item
   |
   = note: expected type `std::iter::Filter<std::iter::Map<_, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String>, for<'r> fn(&'r std::string::String) -> bool>`
              found type `std::iter::Filter<std::iter::Map<_, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String {unwrap}>, for<'r> fn(&'r std::string::String) -> bool {is_short}>`

Remember what I said about function items have unique types? Yeah, that. To fix that, we cast from a function item to a function pointer. We don't even need to specify what we're casting to, we just have to let the compiler know we want it to do a cast.

fn short_lines<'a, T>
    (reader: &'a T)
     -> std::iter::Filter<std::iter::Map<std::io::Lines<T>, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String>, for<'r> fn(&'r std::string::String) -> bool>
    where T: io::BufRead
{
    reader.lines().map(unwrap as _).filter(is_short as _)
}
error[E0308]: mismatched types
  --> src/main.rs:41:29
   |
41 |     let count = short_lines(reader).count();
   |                             ^^^^^^ expected reference, found struct `std::io::BufReader`
   |
   = note: expected type `&_`
              found type `std::io::BufReader<&std::fs::File>`
   = help: try with `&reader`

Again, the compiler tells you exactly what to do. Make the change and...

error[E0507]: cannot move out of borrowed content
  --> src/main.rs:35:5
   |
35 |     reader.lines().map(unwrap as _).filter(is_short as _)
   |     ^^^^^^ cannot move out of borrowed content

Right, that's because you got the input of short_lines wrong. One more change:

fn short_lines<T>
    (reader: T)
     -> std::iter::Filter<std::iter::Map<std::io::Lines<T>, fn(std::result::Result<std::string::String, std::io::Error>) -> std::string::String>, for<'r> fn(&'r std::string::String) -> bool>
    where T: io::BufRead
{
    reader.lines().map(unwrap as _).filter(is_short as _)
}

And now all you have to deal with are warnings.

In short: read the compiler messages. They're useful.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
DK.
  • 55,277
  • 5
  • 189
  • 162
  • I think the `'r` lifetime isn't needed and could be removed. – Stefan Nov 09 '17 at 15:11
  • 2
    Thanks for this! Please consider that I'm learning rust and long complex type signatures are new to me and just because I understand basically "the types are wrong" I can't necessarily understand a 100-character type signature yet and how to interpret the compiler messages meaningfully. – Peter Lyons Nov 09 '17 at 15:12
  • @Stefan yes I confirm `'r` is not required. – Peter Lyons Nov 09 '17 at 15:19
3

I'd suggest doing it the simple way - collecting the iterator into a vector:

use std::fs;
use std::io;
use std::io::prelude::*;

const WORDS_PATH: &str = "/usr/share/dict/words";

fn main() {
    let file = fs::File::open(WORDS_PATH).unwrap();
    let reader = io::BufReader::new(&file);
    let short_lines = reader.lines()
                            .map(|l| l.unwrap())
                            .filter(|l| l.len() < 7)
                            .collect::<Vec<_>>(); // the element type can just be inferred
    let count = short_lines.len();
    println!("{:?}", count);

    let sample_size = (0.05 * count as f32) as usize; // 5% sample

    for line in &short_lines[0..sample_size] {
        println!("{}", line);
    }
}
ljedrz
  • 20,316
  • 4
  • 69
  • 97
  • 1
    @MihailsStrasuns I make no assumptions about the constraints of the asker, nor was it the point of the question. With today's computers memory is rarely an issue and for a simple task like this it might be a totally valid solution. How about an alternative perspective: the fact how easy are Rust users with casually reading the same file twice (while doing things in memory is an order of magnitude faster) is one of the biggest problems with the language ecosystem :P. – ljedrz Nov 09 '17 at 19:00
  • 1
    @MihailsStrasuns hahaha! I'd actually say it's the other way — many Rustaceans are *overly* worried about writing efficient code but neglecting programmer efficiency! I'd love to be pointed at some references you have where people are overly using memory allocation. Rust's references and lifetimes actually make it much *easier* to pass references around, avoiding "just in case" allocations. – Shepmaster Nov 09 '17 at 19:07
  • @Shepmaster you don't have to look far - the very https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines allocates each line as new string. And the fact that this was considered acceptable for standard library primitive tells a lot about dominant attitude. Micro-optimizations are bad but here we are talking about O(n) memory allocations for a task that inherently requires O(1) memory footprint (and a single file pass) and can be expressed efficiently with a similar amount of code. For small specializeds app that can result in huge performance penalty. – Mihails Strasuns Nov 10 '17 at 10:17
  • @MihailsStrasuns yes, there's [a current language limitation about returning references into the iterator itself](https://stackoverflow.com/q/30422177/155423), so unfortunately that's the only way to write the iterator version *right now*. However, [`BufRead::read_line`](https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line) exists on the exact same trait (so it's equally easy to find) and gives you exactly the performance characteristics that you are seeking. – Shepmaster Nov 10 '17 at 14:23
  • I know about the limitation, even wrote blog post once investigating how it affects difference between Rust iterators and D ranges. My point is that language has happily reached 1.0 with such fundamental design problem being known - it wasn't considered important enough to put major effort into it in time. And by now it is too late because libraries are already getting written. There is no inherent reason why `lines` can't be as efficient as `read_line`. It is simply not considered important enough and that is extremely sad. – Mihails Strasuns Nov 10 '17 at 15:48
  • 2
    @MihailsStrasuns *it is simply not considered important enough* — then I'm sure you will be pleased to hear that the Rust community [is working hard on making it possible](https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md). Yes, it's disappointing that it wasn't in Rust 1.0, but major projects have been built without it, and I'd say having a thriving ecosystem is more important than theoretical purity. I'll disagree that it's "too late" as I strongly believe that the community will pick up new tools and techniques and use them effectively. – Shepmaster Nov 10 '17 at 16:04