1

I'm using the zstd-rs crate to decompress a large Zstandard file (~30GB) in Rust. My code opens the file, reads it into a buffer, and then uses the copy_decode function from the zstd::stream module to decompress the data and write it to an output file. Here is my current implementation:

use std::fs::File;
use std::io::{self, BufReader, BufWriter, Write};
use zstd::stream::copy_decode;

const SIZE: usize = 1024 * 1024 * 8;

fn decompress_file(input_file: &str, output_file: &str) -> io::Result<()> {
    let input_file = File::open(input_file)?;
    let output_file = File::create(output_file)?;

    let mut reader = BufReader::with_capacity(SIZE, input_file);
    let mut writer = BufWriter::new(output_file);

    copy_decode(&mut reader, &mut writer)?;

    writer.flush()?;

    Ok(())
}

fn main() {
    let input_file = "R:\\reddit\\RS_2021-03.zst";
    let output_file = "R:/out.data";

    match decompress_file(input_file, output_file) {
        Ok(_) => println!("Decompression successful!"),
        Err(e) => println!("An error occurred: {}", e),
    }
}

However, when I run the program, I receive the following error: An error occurred: Frame requires too much memory for decoding. I've looked into the error and it seems that it might be related to the memory allocation for the decompression process, but I'm unsure of the best way to handle this. I've tried to minimize the memory footprint of the program by using buffered reading and writing with a buffer size of 8MB, but I'm still running into this issue.

Does anyone have suggestions for handling large files with zstd-rs or alternative methods for Zstandard decompression in Rust?"

I initially attempted to use the zstd-safe crate for the decompression, but ran into issues with handling the large input file. I then switched to using the zstd-rs crate with buffered reading and writing in the hopes that it would handle larger files more efficiently. I also tried adjusting the size of the buffer used by the BufReader and BufWriter, testing values from 0.5MB up to 8MB.

I expected that using buffered I/O and zstd-rs's stream-based decoding would be able to handle the large file size, but I am still encountering the 'Frame requires too much memory for decoding' error.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
weiserhase
  • 11
  • 3
  • The implementation of `copy_decode` seems to create a `zstd::stream::read::Decoder`, then call `std::io::copy` with it and the output. Maybe you can manually create a decoder, read N bytes (e.g. 4 KiB) from the decoder, then write them manually to the output without `std::io::copy`? I suspect it might be that `std::io::copy` might try to read into a buffer that's too big for `zstd` to use? – Filipe Rodrigues Jun 10 '23 at 18:20
  • I also tried decompressing this dataset and couldn't figure it out. Looking at the example python, they use an enormous window size so maybe try that? https://github.com/Watchful1/PushshiftDumps/blob/master/scripts/to_csv.py#L40 – drewtato Jun 10 '23 at 19:17
  • I have previously wrote the program in windows using a database to insert the rows in chunks but i was having the problem that the inputs were rather slow so i was trying the db insert and data destructuring in rust around 5x faster. I guess the performance of the rust zstd should be equal or better than the python version. Furthermore i am planning to fetch the live data so i was hoping to do all things in one language – weiserhase Jun 10 '23 at 20:36
  • 2
    I believe you need to explicitly set the max allowed window size while decoding to match the very high limit these compressed files have. I cannot download a 30gb file to test, but can you try creating a decoder and calling `.window_log_max(31)` on it before doing the decompression? https://docs.rs/zstd/latest/zstd/stream/read/struct.Decoder.html#method.window_log_max – Dogbert Jun 11 '23 at 08:21

0 Answers0