0

I'm currently working on creating the programming language described in "Writing an Interpreter in Go", but I'm trying to write the code in Rust.

I have the following struct representing the lexer:

struct Lexer<'a> {
    input: &'a [u8], // assume the text input is all ascii
    curr_pos: Cell<usize>,
    read_pos: Cell<usize>,
    curr_char: Cell<char>,
}

and the following struct representing a token:

struct Token<'a> {
    tokenType: TokenType<'a>,
    literal: &'a str,
}

where tokenType is an enum representing the various kinds of tokens. It's pretty standard, but I'm having trouble with the ownership of literal.

When I try to set create a token:

Token {tokenType: TokenType::ASSIGN, literal: &self.curr_char.get().to_string()}

I get an error about the value &self.curr_char.get().to_string() not living long enough. How do ownership rules come into play here, and what is the best way to "give" a Token its value?

Here's the full code:

use std::cell::Cell;
use std::str;
use std::char;

#[derive(Debug, PartialEq)]
enum TokenType<'a> {
    ILLEGAL,
    EOF,
    IDENT(&'a str),
    INT(i64),
    ASSIGN,
    PLUS,
    COMMA,
    SEMICOLON,
    LPAREN,
    RPAREN,
    LBRACE,
    RBRACE,
    FUNCTION,
    LET,
}

struct Token<'a> {
    tokenType: TokenType<'a>,
    literal: &'a str,
}

struct Lexer<'a> {
    input: &'a [u8], // assume the text input is all ascii
    curr_pos: Cell<usize>,
    read_pos: Cell<usize>,
    curr_char: Cell<char>,
}

impl<'a> Lexer<'a> {
    fn next_token(&self) -> Token {
        const NULL_CHAR: char = 0 as char;
        let token = match self.curr_char.get() {
            '='         => Token {tokenType: TokenType::ASSIGN,    literal: &self.curr_char.get().to_string()},
            ';'         => Token {tokenType: TokenType::SEMICOLON, literal: &self.curr_char.get().to_string()},
            '('         => Token {tokenType: TokenType::LPAREN,    literal: &self.curr_char.get().to_string()},
            ')'         => Token {tokenType: TokenType::RPAREN,    literal: &self.curr_char.get().to_string()},
            '{'         => Token {tokenType: TokenType::LBRACE,    literal: &self.curr_char.get().to_string()},
            '}'         => Token {tokenType: TokenType::RBRACE,    literal: &self.curr_char.get().to_string()},
            ','         => Token {tokenType: TokenType::COMMA,     literal: &self.curr_char.get().to_string()},
            '+'         => Token {tokenType: TokenType::PLUS,      literal: &self.curr_char.get().to_string()},
            '\0'        => Token {tokenType: TokenType::EOF,       literal: &self.curr_char.get().to_string()},
        }
    }
}
ryanyz10
  • 115
  • 1
  • 11
  • @Shepmaster will do, sorry I haven't looked much into variable naming conventions so I mix them up all the time. And literal = string literal representing the token, I don't see how I'm using that incorrectly. It wasn't meant to be a literal in terms of a programming language. – ryanyz10 Aug 20 '18 at 19:23
  • Re "literal" — your original post didn't use backticks (\`) to mark up `literal` as code, so it wan't clear you were referring to the field named "literal" instead of an actual Rust-source-code literal. – Shepmaster Aug 20 '18 at 19:26
  • @Shepmaster I looked at the question you linked and it doesn't quite address my problem. I understand why I can't build a `String` and return `&str`, my question is more how ownership works in `&self.curr_char.get().to_string()`. From my understanding, literal types transfer ownership because they can be copied but that's apparently an incorrect understanding, and I don't know how memory is allocated in this case. – ryanyz10 Aug 20 '18 at 19:39
  • 1
    "literal types" — And now I think you are using the term differently from general meaning again. A literal is `"foo"` or `'x'` or `123`. There are *no literals* in `&self.curr_char.get().to_string()`. *I understand why I can't build a `String` and return `&str`* — but the code you have posted is attempting to build a `String` and return a `&str`, so help us figure out what part of the explanation is lacking. – Shepmaster Aug 20 '18 at 19:52
  • @Shepmaster The value enclosed in the `curr_char` cell is a `char` type, which is a literal, no? Then `self.curr_char.get()` returns a `char` type, which would produce a copy, so no ownership problems. I'm trying to understand how memory works in this case when calling `to_string()` on the `char`. Sorry my question isn't clear, I was (and still am) having trouble phrasing it. – ryanyz10 Aug 20 '18 at 19:59
  • 1
    `Cell` has nothing to do with this. [This code produces the same error with no `Cell`](http://play.rust-lang.org/?gist=fc25a1224c616c06f6ff4fb25e9f3b4b&version=stable&mode=debug&edition=2015). `to_string` creates a new, owned `String`, as its signature suggests: [`fn to_string(&self) -> String`](https://doc.rust-lang.org/std/string/trait.ToString.html#tymethod.to_string) – trent Aug 20 '18 at 20:07
  • 1
    *The value enclosed in the `curr_char` cell is a `char` type* — yes. *which is a literal* — no. *returns a `char` type* — yes. *calling `to_string()` on the `char`* — allocates a completely brand new `String`, as trentcl points out. Just like the proposed duplicate creates a new `String` via `to_string`. – Shepmaster Aug 20 '18 at 20:11
  • I guess I don't understand what a literal is then. Does a character literal have to be of the form `'a'`? But the word I was looking for was primitive, not literal, so my mistake. Anyways, if `to_string()` creates a completely brand-new string, then what has ownership of it? Is that where my problem comes from? – ryanyz10 Aug 20 '18 at 20:19
  • *then what has ownership of it* — nothing (see 3rd linked duplicate) *Is that where my problem comes from* — yes. – Shepmaster Aug 20 '18 at 23:43
  • 1
    Here's a [minimal amount of change that you can make to get your code to compile](https://play.rust-lang.org/?gist=393ae7907dc5e42c3f9e76ff9997cf71&version=stable&mode=debug&edition=2015). – Shepmaster Aug 20 '18 at 23:48
  • Thanks for the help! I'm currently rewriting my code...the issue I had was a result of my lack of understanding of Rust and I think hardcoding values is just a temporary fix. If I have all the `literal` fields as references to the `input` byte slice, would that get rid of the ownership problems? To clarify something like: `&self.input[start..end]` and then condensing that into an `&str` with `str::from_utf8`. – ryanyz10 Aug 20 '18 at 23:52

0 Answers0