0

I'm implementing a code parser in a Parser struct. I am exposing a pub method lines to iterate over the lines of code with the comments removed. I want to return a Box<Iterator>

extern crate regex; // 1.0.5

use regex::Regex;

pub struct Parser {
    code: String,
}

static comment: Regex = Regex::new(r"//.*$").unwrap();

impl Parser {
    pub fn new(code: String) -> Parser {
        Parser { code }
    }

    pub fn lines(&self) -> Box<Iterator<Item = &str>> {
        let lines = self
            .code
            .split("\n")
            .map(|line| comment.replace_all(line, ""));
        Box::new(lines)
    }
}

However, the compiler is giving this error:

error[E0271]: type mismatch resolving `<[closure@src/lib.rs:20:18: 20:54] as std::ops::FnOnce<(&str,)>>::Output == &str`
  --> src/lib.rs:21:9
   |
21 |         Box::new(lines)
   |         ^^^^^^^^^^^^^^^ expected enum `std::borrow::Cow`, found &str
   |
   = note: expected type `std::borrow::Cow<'_, str>`
              found type `&str`
   = note: required because of the requirements on the impl of `std::iter::Iterator` for `std::iter::Map<std::str::Split<'_, &str>, [closure@src/lib.rs:20:18: 20:54]>`
   = note: required for the cast to the object type `dyn std::iter::Iterator<Item=&str>`

It wants me to use std::borrow::Cow, but I can't find anything in the Map docs mentioning this requirement. Why is this necessary? Can I avoid it?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Andy Carlson
  • 3,633
  • 24
  • 43
  • 2
    Idiomatic Rust uses `snake_case` for variables, methods, macros, and fields; `UpperCamelCase` for types; and `SCREAMING_SNAKE_CASE` for statics and constants. Use `static COMMENT` instead, please. – Shepmaster Sep 18 '18 at 20:44

3 Answers3

5

It is highly recommended to read the documentation for all of the types and methods you are using. For example, Regex::replace_all is documented as:

pub fn replace_all<'t, R: Replacer>(
    &self, 
    text: &'t str, 
    rep: R
) -> Cow<'t, str>
//   ^^^^^^^^^^^^

That's where the Cow comes from.

It is impossible to return an iterator of &strs once you have allocated new strings; you will need to pick a new iterator type. Something like this seems possible, but since your code doesn't compile for reasons other than this lifetime issue, I cannot easily test it.

pub fn lines<'a>(&'a self) -> Box<dyn Iterator<Item = Cow<'a, str>> + 'a>

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 2
    Thanks for the recommendation regarding the docs. I'm coming over from JS, which seldom has its objects well documented. So I'm used to just introspecting in a REPL to find out what fields the objects have. I sort of brought that habit with me into Rust. But now, I've been developing my Rust with the docs close at hand from now on, and it's a much better experience! Thanks! – Andy Carlson Sep 19 '18 at 18:44
1

Here is the best solution for my case.

replace_all is not a good method for this use case. I just want to remove comments. I never need to insert anything into the string. If so, I should just be able to work with string slices. No need for the Cow type introduced by replace_all. Here is how I did it instead.

impl Parser {
    pub fn lines<'a>(&'a self) -> Box<dyn Iterator<Item = &'a str> + 'a> {
        let lines = self.code
            .lines()
            .map(|line| { line.split("//").next().unwrap() })
            .map(|line| line.trim())
            .filter(|line| line.len() > 0);

        Box::new(lines)
    }
}
Andy Carlson
  • 3,633
  • 24
  • 43
0

As you have already discovered, Cow comes from Regex::replace_all

In your case, there is a very dangerous and discouraged way to get an iterator of &str:

extern crate regex; // 1.0.5

use regex::Regex;
use std::borrow::Cow;

pub struct Parser {
    code: String,
}

impl Parser {
    pub fn new(code: String) -> Parser {
        Parser { code }
    }

    pub fn lines<'a>(&'a self, comment: Regex) -> Box<Iterator<Item = &'a str> + 'a> {
        let lines = self
            .code
            .split("\n")
            .map(move |line| comment.replace_all(line, ""))
            .map(|cow| match cow {
                Cow::Borrowed(sref) => sref,
                Cow::Owned(_) => panic!("I hope never to be here"),
            });
        Box::new(lines)
    }
}

fn main() {
    let comment: Regex = Regex::new(r"//.*$").unwrap();

    let p = Parser::new("hello\nworld".to_string());

    for item in p.lines(comment) {
        println!("{:?}", item);
    }
}

This works because there are no Cows that allocate strings.

A better way could be to return an iterator of Strings:

pub fn lines<'a>(&'a self, comment: Regex) -> Box<Iterator<Item = String> + 'a> {
    let lines = self
        .code
        .split("\n")
        .map(move |line| comment.replace_all(line, ""))
        .map(|cow| cow.into_owned());

    Box::new(lines)
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
attdona
  • 17,196
  • 7
  • 49
  • 60
  • 2
    To be blunt, this is a *terrible* suggestion. If the `replace_all` call never needs to be run (and thus never needs to return `Cow::Owned`), then *it shouldn't be there*. Presumably OP actually does need to remove comments for some reason, thus this will always panic. It's also not "dangerous" at all. – Shepmaster Sep 19 '18 at 16:30
  • 1
    I also disagree that returning `String`s is universally "better"; now you have to re-allocate for every line, which seems like a big waste of performance. – Shepmaster Sep 19 '18 at 16:31
  • Indeed, I only need to remove comments. Copies should not be necessary. I think `replace_all` was a poor method choice on my part. I posted an answer for a much better approach. – Andy Carlson Sep 19 '18 at 18:46