5

I am deserializing some JSON objects which come in as requests. The input body is nested, but a certain field is sometimes misformatted for a variety of reasons. In that situation I still want the rest of the object. This doesn't all have to be done through serde; but what is happening now, is that if a single subfield is messed up, the whole request is trashed. I want to somehow still deserialize that result and just mark the field as errored out. How can this be done?

E.g. the data schema might look like:

struct BigNested {
    a: Vec<A>,
    b: B, // definition omitted
}

struct A {
    keep_this: Foo,
    trouble: SometimesBad,
}

trouble is the field that's frequently coming in messed up. I would be happy to (e.g.) turn trouble into a Result<SometimesBad, Whatever> and process it from there, but I don't know how to get serde to let me do that.

Richard Rast
  • 1,772
  • 1
  • 14
  • 27
  • Have you read [How to transform fields during deserialization using Serde?](https://stackoverflow.com/q/46753955/155423)? – Shepmaster Dec 09 '19 at 18:36
  • @Shepmaster I haven't seen that answer before but I have been reading the associated documentation for a while (including the `deserialize_with` attribute). However when I was following the associated documentation I ended up writing _so much code_ and felt there must be an easier way. – Richard Rast Dec 09 '19 at 19:05
  • *writing so much code and felt there must be an easier way* — and does the linked question show that way? (TL;DR: you can use intermediate automatic implementations of `Serialize` and `Deserialize`). – Shepmaster Dec 09 '19 at 19:17
  • @Shepmaster I thought I did, however the issue seems to be with the model of serde's deserializer. I can use a custom type which _correctly_ deserializes the bad data _as_ an error. However it seems to be that the `deserializer` object, which is passed in, is "poisoned" -- it remembers that it hit an error, and any further attempt to use it just returns that _same_ error; it can't understand that the associated bad object is over and it should keep trying. – Richard Rast Dec 10 '19 at 01:09
  • (cont) I suspect the issue is https://github.com/serde-rs/serde/issues/464 which is an old issue (2016) opened for this issue, specifically to resolve this issue, and it looks like it was closed without doing it. – Richard Rast Dec 10 '19 at 01:10
  • I think I understand why this is the case (serde supports formats which are not self-describing, and this "recovery" is not possible in those cases) but it is a bit frustrating. – Richard Rast Dec 10 '19 at 01:11

1 Answers1

3

certain field is sometimes misformatted

You didn't say how malformed the incoming JSON was. Assuming it's still valid JSON, you can pull this off with Serde's struct flatten and customized deserialization:

  • The customized deserialization is done in a way that never fails for valid JSON input, although it may not return value of expected type if the input has unexpected format.

  • But these unexpected fields still need to go somewhere. Serde's struct flatten comes in handy here to catch them since any JSON snippet can be deserialized to a HashMap<String, Value>.

//# serde = { version = "1.0.103", features = ["derive"] }
//# serde_json = "1.0.44"
use serde::{Deserialize, Deserializer, de::DeserializeOwned};
use serde_json::Value;
use std::collections::HashMap;

#[derive(Deserialize, Debug)]
struct A {
    keep_this: Foo,
    trouble: SometimesBad,
}

#[derive(Deserialize, Debug)]
struct Foo {
    foo: i32,
}

#[derive(Deserialize, Debug)]
struct SometimesBad {
    inner: TryParse<Bar>,

    #[serde(flatten)]
    blackhole: HashMap<String, Value>,
}

#[derive(Deserialize, Debug)]
struct Bar {
    bar: String,
}

#[derive(Debug)]
enum TryParse<T> {
    Parsed(T),
    Unparsed(Value),
    NotPresent
}

impl<'de, T: DeserializeOwned> Deserialize<'de> for TryParse<T> {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        match Option::<Value>::deserialize(deserializer)? {
            None => Ok(TryParse::NotPresent),
            Some(value) => match T::deserialize(&value) {
                Ok(t) => Ok(TryParse::Parsed(t)),
                Err(_) => Ok(TryParse::Unparsed(value)),
            },
        }
    }
}

fn main() {
    let valid = r#"{ "keep_this": { "foo": 1 }, "trouble": { "inner": { "bar": "one"}}}"#;
    println!("{:#?}", serde_json::from_str::<A>(valid));

    let extra_field = r#"{ "keep_this": { "foo": 1 }, "trouble": { "inner": { "bar": "one"}, "extra": 2019}}"#;
    println!("{:#?}", serde_json::from_str::<A>(extra_field));

    let wrong_type = r#"{ "keep_this": { "foo": 1 }, "trouble": { "inner": { "bar": 1}}}"#;
    println!("{:#?}", serde_json::from_str::<A>(wrong_type));

    let missing_field = r#"{ "keep_this": { "foo": 1 }, "trouble": { "inner": { "baz": "one"}}}"#;
    println!("{:#?}", serde_json::from_str::<A>(missing_field));

    let missing_inner = r#"{ "keep_this": { "foo": 1 }, "trouble": { "whatever": { "bar": "one"}}}"#;
    println!("{:#?}", serde_json::from_str::<A>(missing_inner));
}

(The credit isn't all mine. Serde's issue 1583 basically has everything.)

edwardw
  • 12,652
  • 3
  • 40
  • 51