0

I'm new to Rust and am likely have a huge knowledge gap. Basically, I'm hoping to be create a utility function that would except a regular text file or a ZIP file and return a BufRead where the caller can start processing line by line. It is working well for non ZIP files but I am not understanding how to achieve the same for the ZIP files. The ZIP files will only contain a single file within the archive which is why I'm only processing the first file in the ZipArchive.

I'm running into the the following error.

error[E0515]: cannot return value referencing local variable `archive_contents`
  --> src/file_reader.rs:30:9
   |
27 |         let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();
   |                                                ---------------- `archive_contents` is borrowed here
...
30 |         Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.

file_reader.rs

use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;

pub struct FileReader {
    pub file_reader: Result<Box<BufRead>, &'static str>,
}

pub fn file_reader(filename: &str) -> Result<Box<BufRead>, &'static str> {
    let path = Path::new(filename);
    let file = match File::open(&path) {
        Ok(file) => file,
        Err(why) => panic!(
            "ERROR: Could not open file, {}: {}",
            path.display(),
            why.to_string()
        ),
    };

    if path.extension() == Some(OsStr::new("zip")) {
        // Processing ZIP file.
        let mut archive_contents: zip::read::ZipArchive<std::fs::File> =
            zip::ZipArchive::new(file).unwrap();

        let archive_file: zip::read::ZipFile = archive_contents.by_index(0).unwrap();

        // ERRORS: returns a value referencing data owned by the current function
        Ok(Box::new(BufReader::with_capacity(128 * 1024, archive_file)))
    } else {
        // Processing non-ZIP file.
        Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
    }
}

main.rs

mod file_reader;

use std::io::BufRead;

fn main() {
    let mut files: Vec<String> = Vec::new();

    files.push("/tmp/text_file.txt".to_string());
    files.push("/tmp/zip_file.zip".to_string());

    for f in files {
        let mut fr = match file_reader::file_reader(&f) {
            Ok(fr) => fr,
            Err(e) => panic!("Error reading file."),
        };

        fr.lines().for_each(|l| match l {
            Ok(l) => {
                println!("{}", l);
            }
            Err(e) => {
                println!("ERROR: Failed to read line:\n  {}", e);
            }
        });
    }
}

Any help is greatly appreciated!

hooinator
  • 339
  • 2
  • 13
  • Does this answer your question? [Is there any way to return a reference to a variable created in a function?](https://stackoverflow.com/questions/32682876/is-there-any-way-to-return-a-reference-to-a-variable-created-in-a-function) – Jmb May 06 '20 at 06:57

3 Answers3

1

It seems the archive_contents is preventing the BufRead object from returning to the caller. I'm just not sure how to work around this.

You have to restructure the code somehow. The issue here is that, well, the archive data is part of the archive. So unlike file, archive_file is not an independent item, it is rather a pointer of sort into the archive itself. Which means the archive needs to live longer than archive_file for this code to be correct.

In a GC'd language this isn't an issue, archive_file has a reference to archive and will keep it alive however long it needs. Not so for Rust.

A simple way to fix this would be to just copy the data out of archive_file and into an owned buffer you can return to the parent. An other option might be to return a wrapper for (archive_contents, item_index), which would delegate the reading (might be somewhat tricky though). Yet another would be to not have file_reader.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • Thanks @Masklinn, would you be able to point me to documentation to create a "owned buffer"? – hooinator May 06 '20 at 17:22
  • @hooinator you can read the contents of the file into a vec, then wrap a `std::io::Cursor` around the vec. `Cursor` implements `BufRead`, so then you should be able to box the cursor and return it as a trait object. – Masklinn May 06 '20 at 19:22
1

Thanks to @Masklinn for the direction! Here's the working solution using their suggestion.

file_reader.rs

use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::io::Cursor;
use std::io::Error;
use std::io::Read;
use std::path::Path;
use zip::read::ZipArchive;

pub fn file_reader(filename: &str) -> Result<Box<dyn BufRead>, Error> {
    let path = Path::new(filename);
    let file = match File::open(&path) {
        Ok(file) => file,
        Err(why) => return Err(why),
    };

    if path.extension() == Some(OsStr::new("zip")) {
        let mut archive_contents = ZipArchive::new(file)?;

        let mut archive_file = archive_contents.by_index(0)?;

        // Read the contents of the file into a vec.
        let mut data = Vec::new();

        archive_file.read_to_end(&mut data)?;

        // Wrap vec in a std::io::Cursor.
        let cursor = Cursor::new(data);

        Ok(Box::new(cursor))
    } else {
        // Processing non-ZIP file.
        Ok(Box::new(BufReader::with_capacity(128 * 1024, file)))
    }
}
hooinator
  • 339
  • 2
  • 13
0

While the solution you have settled on does work, it has a few disadvantages. One is that when you read from a zip file, you have to read the contents of the file you want to process into memory before proceeding, which might be impractical for a large file. Another is that you have to heap allocate the BufReader in either case.

Another possibly more idiomatic solution is to restructure your code, such that the BufReader does not need to be returned from the function at all - rather, structure your code so that it has a function that opens the file, which in turn calls a function that processes the file:

use std::ffi::OsStr;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::path::Path;

pub fn process_file(filename: &str) -> Result<usize, String> {
    let path = Path::new(filename);
    let file = match File::open(&path) {
        Ok(file) => file,
        Err(why) => return Err(format!(
            "ERROR: Could not open file, {}: {}",
            path.display(),
            why.to_string()
        )),
    };

    if path.extension() == Some(OsStr::new("zip")) {
        // Handling a zip file
        let mut archive_contents=zip::ZipArchive::new(file).unwrap();
        let mut buf_reader = BufReader::with_capacity(128 * 1024,archive_contents.by_index(0).unwrap());
        process_reader(&mut buf_reader)
    } else {
        // Handling a plain file.
        process_reader(&mut BufReader::with_capacity(128 * 1024, file))
    }

}

pub fn process_reader(reader: &mut dyn BufRead) -> Result<usize, String> {
    // Example, just count the number of lines
    return Ok(reader.lines().count());
}

fn main() {
    let mut files: Vec<String> = Vec::new();

    files.push("/tmp/text_file.txt".to_string());
    files.push("/tmp/zip_file.zip".to_string());

    for f in files {

        match process_file(&f) {
            Ok(count) => println!("File {} Count: {}", &f, count),
            Err(e) => println!("Error reading file: {}", e),
        };

    }
}

This way, you don't need any Boxes and you don't need to read the file into memory before processing it.

A drawback to this solution would if you had multiple functions that need to be able to read from zip files. One way to handle that would be to define process_file to take a callback function to do the processing. First you would change the definition of process_file to be:

pub fn process_file<C>(filename: &str, process_reader: C) -> Result<usize, String>
    where C: FnOnce(&mut dyn BufRead)->Result<usize, String>

The rest of the function body can be left unchanged. Now, process_reader can be passed into the function, like this:

process_file(&f, count_lines)

where count_lines would be the original simple function to count the lines, for instance.

This would also allow you to pass in a closure:

process_file(&f, |reader| Ok(reader.lines().count()))
harmic
  • 28,606
  • 5
  • 67
  • 91