Cache deserialized results in Serde?

Question

I'm writing a small budgeting program as a side project to learn Rust. Serde handles serialization and deserialization. However, there may be a large number of transactions (accessed by UUID). I would like to cache UUID->Transaction in a HashSet while deserializing. However, I am getting a recursion error and cannot figure out why.

Transaction is the type that I would like cached:

use serde::{Serialize};
use uuid::Uuid;

#[derive(Serialize, Debug)]
pub struct Transaction {

    pub(crate) id: Uuid,
    ...
}

Each of these is owned by an Account, which also provides the custom deserialization code:

use serde::{Serialize, Deserialize};

use uuid::Uuid;

#[derive(Serialize, Deserialize)]
pub struct Account {

    pub(super) id: Uuid,
    pub name: String,
    pub transactions: Vec<Transaction>
}

impl<'de> Deserialize<'de> for Transaction {

    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: serde::de::Deserializer<'de> {

        let trans = serde::de::Deserialize::deserialize(deserializer)?;
        // Caching would go here
        Ok(trans)
    }
}

However, the call to Deserialize::deserialize(deserializer)? gives me the following error:

 | /     fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
31 | |         where D: serde::de::Deserializer<'de> {
   | |_____________________________________________^ cannot return without recursing
32 |
33 |           let trans = serde::de::Deserialize::deserialize(deserializer)?;
   |                       ------------------------------------------------- recursive call site
   |
   = help: a `loop` may express intention better if this is on purpose
   = note: `#[warn(unconditional_recursion)]` on by default

I see how this could be a recursive call, and know the compiler is right, as the stack overflows at runtime. But what should I do to break the cycle? I was following the advice given in another post here or even here. I don't see how that code differs from my own (i.e., why the other code also does not have the same recursive issue).

It's not that it might recurse, it definitely will, and there is no condition (no if, match, other control flow) which has a path out so calling `deserialize` will result in it calling itself until the stack overflows. Unfortunately it's not exactly clear what you want to cache so there is little to no help I can come up with. The difference in the examples is that it deserializes to different types than `Self` so it calls a different `deserialize`. — cafce25, Mar 10 '23 at 15:24
Thanks for the quick reply. I'm looking to have something like: HashSet stored elsewhere. It would be populated where the comment is (in the deserializer) so that I can easily look up deserialized Transaction instances via a UUID instead of navigating through the object graph. — Saish, Mar 10 '23 at 15:27
Yes, apologies. Switching from Scala to Rust each day gets me confused. :^) — Saish, Mar 10 '23 at 15:49
I don't think it's possible or at least it's very unergonomic with the design of `serde` if you can change the serialization format you might want to (de-)serialize to a format where you only store `Uuid`s in the serialized `Account`s and store a `Map` of the actual `Transaction`s somewhere next to it, then convert to/from that later. — cafce25, Mar 10 '23 at 15:59
If you need to pass in a cache into the `Deserialize` implementation, you likely want to use `DeserializeSeed` from serde. A trick how you can break the recursion is to use serde's remote derive capabilities. You can annotate your `Transaction` with `#[serde(remote = "Self")]`. The derive will generate two inherent methods instead of two trait implementations. You can then use them inside you `DeserializeSeed` implementation with `let trans = Transaction::deserialize(deserializer)?;` without causing recursion. — jonasbb, Mar 10 '23 at 16:29
Okay, thanks. I was thinking some crafty combination of Rc and RcCell might also work. Will keep digging and post if I solve. Cheers! — Saish, Mar 10 '23 at 18:27
The most straightforward way is most likely to have a wrapper type, like `Cached(T)`, that implements `deserialize` by looking into the cache first, then if the value is not found, actually call the `deserialize` implementation of `T`. Note, however, that since `deserialize` cannot access any variable besides `static` ones, it's probably a bad idea to do that this way. Instead, you could simply create a method like `deserialize_with_cache` which takes a cache as an argument, and use that one to deserialize. — jthulhu, Mar 10 '23 at 18:40
I like that idea. (Would prefer to avoid statics where possible, yes). Just so I get the idea straight, would Serde be able to pass the cache in to that function? Or are you thinking it would be at top level function that would accept it instead? I'm trying to figure out how to get that framework to pass in a separate variable unrelated to the deserialization. Thanks! — Saish, Mar 10 '23 at 19:32
@Saish I think that instead of "overloading" the implementation of the deserialization, you should have an other function (outside the `Deserialize` trait) that should be called instead when you want to deserialize; that function should be passed (or have access to, whatever) the cache, and should decide whether it needs to call `deserialize`, or just fetch the value from the cache. — jthulhu, Mar 10 '23 at 20:31
I would happily do that. I'm looking for the best of both worlds: deserializing the "easy" parts using Serde normally, and customizing for this caching piece. Do you perchance have an example of how to selectively invoke the deserialization at a point in the processing? One thought that came to mind was to see if I can use a Json object from Serde as the "root" and then stitch that into the overall deserialized result. — Saish, Mar 10 '23 at 21:29
Thanks to all. I implemented the caching after the deserialization. I think it is slightly less efficient (having to iterate through everything to cache as an extra step). But it works. :^) — Saish, Mar 12 '23 at 14:50
I'm a bit confused on the ownership of the transactions. You say you want the account to own the transactions. Then what is stored in the cache, a reference? If so, then there's no way to actually store a reference to the transaction in the cache during deserialization, since the transaction will be moved upon being returned. You might be able to do it during deserialization if you reverse the ownership. See this playground for an example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=24d7d4c504502c49fd691de8e459be5b. Otherwise, caching after deserializing is best. — Anders Evensen, Mar 23 '23 at 06:55

Cache deserialized results in Serde?

0 Answers0