0

I've been working on a sample Rust program to search for regexps (https://github.com/russellyoung/regexp-rust). It works, but I want to add a way to get the string to search from a file or stdio instead of on the command line. The program is single threaded without events so there are no coordination issues.

To do this I want a single global buffered string that can be referred to during the walk and results phases. Clearly the easiest way to do this is to read the entire input before processing, but as an exercise I don't want to do that. Instead it adds a fixed amount into the current string. One way to do this is to pass the string down from the top into every subroutine. I don't want to do this for a couple reasons: first, it is a large job that would require rewriting an entire module to allow for error returns where they were not needed before. Second, for the final report, and also tracing and debugging, objects need to reference this string, and I can't pass it to them in the Debug trait.

So, I have a structure that looks like this:

enum Source { CmdLine, Stdin(BufReader<std::io::Stdin>), File(BufReader<std::fs::File>) }

impl Source {
    /// Extends 
    fn extend(&mut self) -> std::io::Result<String> {
        let mut string = "".to_string();
        match self {
            Source::CmdLine => (),
            Source::File(stream) => while stream.read_line(&mut string)? > 0 && string.len() < Input::BLOCK_SIZE {},
            Source::Stdin(stream) => while stream.read_line(&mut string)? > 0 && string.len() < Input::BLOCK_SIZE {},
        }
        Ok(string)
    }
}

/// Buffers the string to search. Unwanted characters can be removed from the front and new ones can be added to the end.
pub struct Input {
    /// The text currently in the buffer
    pub full_text: String,
    /// The source for getting more text
    source: Source,
    /// the byte offset of the front of the buffer (number of bytes deleted)
    b_start: usize,
    /// the character offset of the front of the buffer (number of characters deleted)
    c_start: usize,
    /// start of the current search in the buffered text, in bytes
    b_offset: usize,
    /// start of the current search in the buffered text, in chars
    c_offset: usize
}

thread_local!(static INPUT: RefCell<Input> = RefCell::new(Input::from_stdin().unwrap()));

I want to get slices from the global String Input::full_string returned from function calls. I understand the danger that the reference will be invalidated when the string is extended, but this will never happen: in the walk phase, when the string can grow, the slice is used immediately either to compare a match or in a trace to print out and then goes out of scope, and in the report phase the string does not grow so its address is fixed (and even there if it is global I can get the slice when it is needed and don't need to it around).

When the code changes are compiled it gives a lifetime error (as I expected) that it can't return references to RefCell::borrow(). I know there are workarounds - besides passing the structure around rather than making it global, I could pass closures into the Input instance to run the tests there and use my own debug routines instead of println!, or return string copies instead of slices. or even return the actual String and then give it back once it has been used. Is there any way short of unsafe{} that is considered best in this situation? Is this a case where unsafe{} can be used? Or is refactoring to pass the Input struct everywhere the only preferred way to do this?

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
russell
  • 660
  • 1
  • 10
  • 18
  • 2
    Passing the data through is generally the way you want to do this anyway. Global variables generally hint that the design is poor. A function's dependencies should be communicated in its argument list; global variables subvert this, giving the function a hidden dependency. – cdhowie Jun 01 '23 at 06:23
  • @cafce25 Thank you, I missed that one, it is just what I need - mark this as duplicate. – russell Jun 01 '23 at 21:20
  • @cdhowie I see that now - one of the things I Learned in this design is to plan ahead better. Last week I spent over a day trying to retrofit it in by passing it, which required not only adding the reference to every system call but also adding error returns everywhere. If I had recognized this problem earlier I would have done it that way, so I guess that supports your evaluation that the current design is poor. I've spent far too much time on this though, and so am willing to settle for a less-good design as long as it is not too ugly ("unsafe"). cdhowie's answer does what I need I think – russell Jun 01 '23 at 21:29

0 Answers0