5

I am working on a small lexer in Rust. I had the idea of putting the lexing phase into the implementation of the Iterator trait.

struct Lexer {
    text: String
}

impl Iterator for Lexer {
    ...
    fn next(&mut self) -> Option<LexItem>{
        ....
        // slicing issue
        self.text = self.text[i .. self.text.len()]

    }
}

I have not quite grokked lifetime management here completely. I would be fine by defining the struct with a lifetime for the text attribute which would (probably) make the subslicing more easy. Yet I fail to incorporate such a lifetime in my code. On the other hand, I have a hard time converting the slice self.text[i .. .....] into a String again (dunno if that is possible).

What I tried:

I tried the following modification:

struct Lexer<'a> {
    text: &'a str
}

impl<'a> Iterator for Lexer<'a> {
    ...
    fn next(&'a mut self) -> Option<LexItem>{
        ....
        // slicing issue
        self.text = self.text[i .. self.text.len()]

    }
}

I get the error:

src/lexer.rs:64:5: 81:6 error: method `next` has an incompatible type for trait: expected bound lifetime parameter , found concrete lifetime [E0053]

the other implementation I tried

impl<'a> Iterator for Lexer<'a> {
    ...
    fn next<'b>(&'b mut self) -> Option<LexItem>{
        ....
        // slicing issue
        self.text = self.text[i .. self.text.len()]

    }
}
src/lexer.rs:66:21: 66:52 error: mismatched types:
 expected `&'a str`,
    found `str`
(expected &-ptr,
    found str) [E0308]
src/lexer.rs:66         self.text = self.text[i .. self.text.len()];
                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I figure that something like this should work, as I would work with subslices only.

wirrbel
  • 3,173
  • 3
  • 26
  • 49

1 Answers1

5

(By the way: foo[i..foo.len()] should always be equivalent to foo[i..].)

The type of self.text[i..] is the unsized type str if self.text is of type String or &str. In order to make it sized (thus, in order to make it work), you need to turn it into the same type as text.

If text is String, this could be done by calling .to_string() on the result of the slicing; a reference will automatically be taken, making it legal. Thus, self.text = self.text[i..].to_string();. (std::borrow::ToOwned::to_owned could also be used and would be slightly more efficient.)

If text is &str, just prefix the slicing operation with &, making it take a reference as is needed: self.text = &self.text[i..];.

For the whole lifetime matter, please read my answer to https://stackoverflow.com/a/24575591/497043; it explains your problems with fn next(&'a mut self) and so forth.

It looks to me like you want the whole thing to be based around string slices (&str) rather than owned strings (String). The former works for iterators (see the aforementioned answer) while the latter does not.

Community
  • 1
  • 1
Chris Morgan
  • 86,207
  • 24
  • 208
  • 215
  • 1
    You mention _"std::borrow::ToOwned::to_owned could also be used and would be slightly more efficient."_; mind elaborating on why would it be slightly more efficient? OTOH, which one of `to_string()`/`to_owned()` would be more idiomatic in this case? – julen Dec 30 '20 at 09:29
  • 2
    `.to_string()` uses the `std::fmt` infrastructure, which is comparatively heavy. On simple cases like a string it should now optimise away fully at runtime (this was not *quite* true when I wrote the answer), but there are certain slight variants that you can form which won’t optimise away perfectly, and there’s still a little compile-time overhead too. I myself would probably most commonly write `String::from(text)` these days, but I wouldn’t blink at `text.to_string()`. – Chris Morgan Jan 03 '21 at 19:20