I've been working on a sample Rust program to search for regexps (https://github.com/russellyoung/regexp-rust). It works, but I want to add a way to get the string to search from a file or stdio instead of on the command line. The program is single threaded without events so there are no coordination issues.
To do this I want a single global buffered string that can be referred to during the walk and results phases. Clearly the easiest way to do this is to read the entire input before processing, but as an exercise I don't want to do that. Instead it adds a fixed amount into the current string. One way to do this is to pass the string down from the top into every subroutine. I don't want to do this for a couple reasons: first, it is a large job that would require rewriting an entire module to allow for error returns where they were not needed before. Second, for the final report, and also tracing and debugging, objects need to reference this string, and I can't pass it to them in the Debug trait.
So, I have a structure that looks like this:
enum Source { CmdLine, Stdin(BufReader<std::io::Stdin>), File(BufReader<std::fs::File>) }
impl Source {
/// Extends
fn extend(&mut self) -> std::io::Result<String> {
let mut string = "".to_string();
match self {
Source::CmdLine => (),
Source::File(stream) => while stream.read_line(&mut string)? > 0 && string.len() < Input::BLOCK_SIZE {},
Source::Stdin(stream) => while stream.read_line(&mut string)? > 0 && string.len() < Input::BLOCK_SIZE {},
}
Ok(string)
}
}
/// Buffers the string to search. Unwanted characters can be removed from the front and new ones can be added to the end.
pub struct Input {
/// The text currently in the buffer
pub full_text: String,
/// The source for getting more text
source: Source,
/// the byte offset of the front of the buffer (number of bytes deleted)
b_start: usize,
/// the character offset of the front of the buffer (number of characters deleted)
c_start: usize,
/// start of the current search in the buffered text, in bytes
b_offset: usize,
/// start of the current search in the buffered text, in chars
c_offset: usize
}
thread_local!(static INPUT: RefCell<Input> = RefCell::new(Input::from_stdin().unwrap()));
I want to get slices from the global String Input::full_string returned from function calls. I understand the danger that the reference will be invalidated when the string is extended, but this will never happen: in the walk phase, when the string can grow, the slice is used immediately either to compare a match or in a trace to print out and then goes out of scope, and in the report phase the string does not grow so its address is fixed (and even there if it is global I can get the slice when it is needed and don't need to it around).
When the code changes are compiled it gives a lifetime error (as I expected) that it can't return references to RefCell::borrow(). I know there are workarounds - besides passing the structure around rather than making it global, I could pass closures into the Input instance to run the tests there and use my own debug routines instead of println!, or return string copies instead of slices. or even return the actual String and then give it back once it has been used. Is there any way short of unsafe{} that is considered best in this situation? Is this a case where unsafe{} can be used? Or is refactoring to pass the Input struct everywhere the only preferred way to do this?