1

I'm trying to use the crc-rs crate to hash some files.

There's already a common interface defined. Basically, all "hashers" should implement std::io::Write, so that you can simply do std::io::copy(&mut file, &mut hasher) (like Rust Crypto hashes do). crc-rs, though, uses a "manual" interface like so (from the example using a custom algorithm):

// use custom algorithm
const CUSTOM_ALG: Algorithm<u16> = Algorithm {
    poly: 0x8005,
    init: 0xffff,
    refin: false,
    refout: false,
    xorout: 0x0000,
    check: 0xaee7,
    residue: 0x0000
};
let crc = Crc::<u16>::new(&CUSTOM_ALG);
let mut digest = crc.digest();
digest.update(b"123456789");
assert_eq!(digest.finalize(), 0xaee7);

Notice that you first create a Crc and then you call crc.digest(), which returns a Digest. Internally, a digest seems to contain a reference to the actual Crc that created it, which must live at least as long as the Digest (am I understanding the lifetime correctly?).

So, basically, I want to create an encapsulated struct that contains its own Crc and the Digest it generates. Then, I can implement std::io::Write over this struct (similar to this). Something along these lines:

pub struct CRC32Hasher<'a> {
    crc: crc::Crc<u32>,
    crc_digest: crc::Digest<'a, u32>,
}

impl<'a> CRC32Hasher<'a> { // Is the lifetime being used properly here?
    pub fn new() -> CRC32Hasher<'a> {
        let crc = crc::Crc::<u32>::new(&crc::CRC_32_CKSUM);

        let result = CRC32Hasher{
            crc_digest: crc.digest(),
            crc,
        };

        result
    }
}

cargo check output:

error[E0505]: cannot move out of `crc` because it is borrowed
  --> src\hash.rs:20:4
   |
14 | impl<'a> CRC32Hasher<'a> { // Is the lifetime being used properly here?
   |      -- lifetime `'a` defined here
...
19 |             crc_digest: crc.digest(),
   |                         ------------ borrow of `crc` occurs here
20 |             crc,
   |             ^^^ move out of `crc` occurs here
...
23 |         result
   |         ------ returning this value requires that `crc` is borrowed for `'a`

error[E0515]: cannot return value referencing local variable `crc`
  --> src\hash.rs:23:3
   |
19 |             crc_digest: crc.digest(),
   |                         ------------ `crc` is borrowed here
...
23 |         result
   |         ^^^^^^ returns a value referencing data owned by the current function

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.
error: could not compile `hasher` due to 2 previous errors

I'm afraid I'm very new to Rust (this is sort of a test project). I still don't quite get how lifetime generics work in practice, for example. I don't know how to start tackling this problem. Is there a standard way to deal with this kind of pattern? Similar problems with self-referential data led me to believe I need unsafe code, or at least std::ptr and std::pin::Pin? Am I way overthinking this?

I could of course simply change the interface or use another CRC implementation, but I got interested in the general problem. I would like to know what's the good practice in this kind of situation.

Edit:

I had came into this and similar. I think I understand the logic on why the problem happens (that post in particular helped me a lot). I also think I could fix it if crc_digest was a simple reference to data directly in the struct; the official Pin tutorial and every other I've found seems to assume this case. I think (and this may be just my inexperience) this is different because I'm not directly referencing data in the struct. I'm creating a new piece of data (the Digest), which happens to reference my data, and then I need to point to that. I don't know what needs to change to do this safely.

blkhwk6
  • 11
  • 3
  • I'm not sure to understand why you need to store the Crc along with its Digest. It looks to me as if your intent is to implement io::Write to write a single file, and associate it to a specific Digest. Meanwhile, the Crc can be reused for multiple files, so I believe the CrC could exist outside of the CRC32Hasher struct so it's not recreated every time. Am I correct? – SirDarius Jan 04 '22 at 14:30
  • @SirDarius You are right in that being possible (and maybe the recommended thing in Rust). The original idea was to keep many similar "hasher" structures, maybe implementing other algorithms with their own interfaces, encapsulate them in a single struct implementing *my* interface, and dispatch it once from a function or map at the start (I know, probably OOP habits here). Basically, I was willing to accept the little space overhead for a pretty, unified API. Even if it's not the most efficient way to do it, I though I could learn more about Rust by going through the uphill road, so to speak. – blkhwk6 Jan 05 '22 at 06:42
  • I really recommend [this](https://morestina.net/blog/1868/self-referential-types-for-fun-and-profit) blog post which explains the situation well and offers a potential solution. – RichardW Jan 07 '23 at 01:21

0 Answers0