0

I'm trying to implement a simple interpreter in Rust for a made up programming language called rlox, following Bob Nystrom's book Crafting Interpreters.

I want errors to be able to occur in any child module, and for them to be "reported" in the main module (this is done in the book, with Java, by simply calling a static method on the containing class which prints the offending token and line). However, if an error occurs, it's not like I can just return early with Result::Err (which is, I assume, the idiomatic way to handle errors in Rust) because the interpreter should keep running - continually looking for errors.

Is there an (idiomatic) way for me to emulate the Java behaviour of calling a parent class' static method from a child class in Rust with modules? Should I abandon something like that entirely?

I thought about a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner and Token structs, but that seems unwieldy to me (I don't feel like an error reporter should be part of the struct's signature, am I wrong?):

struct Token {
   error_reporter: Rc<ErrorReporter>, // Should I avoid this?
   token_type: token::Type,
   lexeme: String,
   line: u32   
}

This is the layout of my project if you need to visualise what I'm talking about with regards to module relationships. Happy to provide some source code if necessary.

rlox [package]
└───src
    ├───main.rs (uses scanner + token mods, should contain logic for handling errors)
    ├───lib.rs (just exports scanner and token mods)
    ├───scanner.rs (uses token mod, declares scanner struct and impl)
    └───token.rs (declares token struct and impl)
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
jonny
  • 3,022
  • 1
  • 17
  • 30
  • 1
    You're right, idiomatic error handling usually halts execution at the first error. In this case, treating errors as errors might not be the right thing, but rather first class, expected data. You can think of the parsing as a traversal, which reduces down to a final state, which consists of a collection of tokens and a collection of errors. But it's hard to really offer advice without seeing your code. See: [mcve] – Peter Hall Apr 26 '18 at 16:36

1 Answers1

6

Literal translation

Importantly, a Java static method has no access to any instance state. That means that it can be replicated in Rust by either a function or an associated function, neither of which have any state. The only difference is in how you call them:

fn example() {}

impl Something {
    fn example() {}
}

fn main() {
    example();
    Something::example();
}

Looking at the source you are copying, it doesn't "just" report the error, it has code like this:

public class Lox {
  static boolean hadError = false;

  static void error(int line, String message) {
    report(line, "", message);
  }

  private static void report(int line, String where, String message) {
    System.err.println(
        "[line " + line + "] Error" + where + ": " + message);
    hadError = true;
  }
}

I'm no JVM expert, but I'm pretty sure that using a static variable like that means that your code is no longer thread safe. You simply can't do that in safe Rust; you can't "accidentally" make memory-unsafe code.

The most literal translation of this that is safe would use associated functions and atomic variables:

use std::sync::atomic::{AtomicBool, Ordering, ATOMIC_BOOL_INIT};

static HAD_ERROR: AtomicBool = ATOMIC_BOOL_INIT;

struct Lox;

impl Lox {
    fn error(line: usize, message: &str) {
        Lox::report(line, "", message);
    }

    fn report(line: usize, where_it_was: &str, message: &str) {
        eprintln!("[line {}] Error{}: {}", line, where_it_was, message);
        HAD_ERROR.store(true, Ordering::SeqCst);
    }
}

You can also choose more rich data structures to store in your global state by using lazy_static and a Mutex or RwLock, if you need them.

Idiomatic translation

Although it might be convenient, I don't think such a design is good. Global state is simply terrible. I'd prefer to use dependency injection.

Define an error reporter structure that has the state and methods you need and pass references to the error reporter down to where it needs to be:

struct LoggingErrorSink {
    had_error: bool,
}

impl LoggingErrorSink {
    fn error(&mut self, line: usize, message: &str) {
        self.report(line, "", message);
    }

    fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
        eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
        self.had_error = true;
    }
}

fn some_parsing_thing(errors: &mut LoggingErrorSink) {
    errors.error(0, "It's broken");
}

In reality, I'd rather define a trait for things that allow reporting errors and implement it for a concrete type. Rust makes this nice because there's zero performance difference when using these generics.

trait ErrorSink {
    fn error(&mut self, line: usize, message: &str) {
        self.report(line, "", message);
    }

    fn report(&mut self, line: usize, where_it_was: &str, message: &str);
}

struct LoggingErrorSink {
    had_error: bool,
}

impl LoggingErrorSink {
    fn report(&mut self, line: usize, where_it_was: &str, message: &str) {
        eprintln!("[line {} ] Error {}: {}", line, where_it_was, message);
        self.had_error = true;
    }
}

fn some_parsing_thing<L>(errors: &mut L)
where
    L: ErrorSink,
{
    errors.error(0, "It's broken");
}

There's lots of variants of implementing this, all depending on your tradeoffs.

  • You could choose to have the logger take &self instead of &mut, which would force this case to use something like a Cell to gain internal mutability of had_error.
  • You could use something like an Rc to avoid adding any extra lifetimes to the calling chain.
  • You could choose to store the logger as a struct member instead of a function parameter.

For your extra keyboard work, you get the benefit of being able to test your errors. Simply whip up a dummy implementation of the trait that saves information to internal variables and pass it in at test time.

Opinions, ahoy!

a strategy where I inject a reference to some ErrorReporter struct as a dependency into the Scanner

Yes, dependency injection is an amazing solution to a large number of coding issues.

and Token structs

I don't know why a token would need to report errors, but it would make sense for the tokenizer to do so.

but that seems unwieldy to me. I don't feel like an error reporter should be part of the struct's signature, am I wrong?

I'd say yes, you are wrong; you've stated this as an absolute truth, of which very few exist in programming.

Concretely, very few people care about what is inside your type, probably only to be the implementer. The person who constructs a value of your type might care a little because they need to pass in dependencies, but this is a good thing. They now know that this value can generate errors that they need to handle "out-of-band", as opposed to reading some documentation after their program doesn't work.

A few more people care about the actual signature of your type. This is a double-edged blade. In order to have maximal performance, Rust will force you to expose your generic types and lifetimes in your type signatures. Sometimes, this sucks, but either the performance gain is worth it, or you can hide it somehow and take the tiny hit. That's the benefit of a language that gives you choices.

See also

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • you are a gentleman and a scholar. Thank you so much for the thorough answer, this has helped me endlessly!! – jonny Apr 27 '18 at 08:11
  • @jonny you are welcome, but please take care in the future about assuming someone's gender. Humans tend to dislike being misclassified. – Shepmaster Apr 27 '18 at 16:05
  • Just a figure of speech! No offence meant – jonny Apr 27 '18 at 16:29