1

I'm trying to parse this kind of XML with serde and an xml crate, but I'm not sure it's possible with serde:

<vm>
    <id>15</id>
    <template>
        <tag1>content</tag1>
        <tag2>content</tag2>
        <vec1>
            <tag3>content</tag3>
            <tag4>content</tag4>
        </vec1>
        <tag2>content</tag2>
        <vec1>
            <tag3>content</tag3>
        </vec1>
        <vec2>
            <tag3>content</tag3>
        </vec2>
    </template>
</vm>

All tag names tagX and vecX keys are dynamic (and not necessarily unique), all others names are static and known.

The content inside template have only two possible forms: Either a representation of a (key, value) "pair": <key>value</key>

Or a a representation of a "vector" (vec_key, collection of pairs): <vec_key><key>value</key><key2>value2</key2> ... </vec_key>

I'm trying to represent the data to something close to this (with tag name in first String):

enum Element {
    Pair(String, String),
    Vector(String, Vec<(String, String)>),
}

pub struct VM {
    id: i64,
    template: Vec<Element>,
}

So the above XML would be deserialized to something like:

[
  Pair("tag1", "content"),
  Pair("tag2", "content"),
  Vector("vec1", [("tag3", "content"),("tag4", "content")]),
  Pair("tag2", "content"),
  Vector("vec1", [("tag3", "content")]),
  Vector("vec2", [("tag3", "content")])
]

I'm open to modify a bit the representation but I just don't want to store the datas in complex nested data structures.

Is it possible with Serde ?

For the context I did the same with Golang and encoding/xml module, and I was able to mix regular structure deserialization with custom deserialization (working directly with the pull parser for the custom part)

treywelsh
  • 13
  • 4
  • There is some ambiguity here. Can you share the *exact* Element structure you expect given your input? – jq170727 Feb 06 '23 at 19:19
  • Thanks for reading, I edited the content. The idea is to have some configuration with (key, value) elements or grouped elements (key_vec, ((key, value), ... (keyX, valueX)). In case this helps here is the complete (and complex) xsd: https://github.com/OpenNebula/one/blob/master/share/doc/xsd/vm.xsd I'm trying to work on a simplified version – treywelsh Feb 06 '23 at 20:49
  • What I meant the specific Rust structures from your example. For instance how should the two`` elements merge given that they both contain an identical ``? and what if the `` element content wasn't identical? Can you provide an example of a unit test for this specific case? – jq170727 Feb 07 '23 at 17:06
  • Ok sorry, I edited the issue. For instance I tried this struct: `#[derive(Debug, Deserialize)] pub struct VM { id: i64, template: HashMap>>>, }` With a kind of result: `id: 15 template: {"vec1": [{"tag3": ["toto4"], "tag4": ["toto5"]}, {"tag3": ["toto7"]}], "tag4": [{"$value": ["toto8"]}], "vec2": [{"tag3": ["toto6"]}], "tag1": [{"$value": ["toto1"]}], "tag2": [{"$value": ["toto2"]}, {"$value": ["toto3"]}]}` But this is a complex representation – treywelsh Feb 08 '23 at 22:03
  • Taking inspiration from [issues/1775](https://github.com/serde-rs/serde/issues/1775) I did some [fiddling around](https://gist.github.com/jq170727/ad67e054a5688e59bbde57b0d46f0b79) with `serde-xml-rs` and `#[serde(field_identifier)]` but it appears the mixed tag order is causing some elements within the ` – jq170727 Feb 09 '23 at 08:04
  • Thanks for your time. In your example it seems that the last occurence of a key (`vec1` for instance) override the previous associated value, so it should work for unique values. I was influenced by my experience with Golang module, unless I find a way to do with serde I may use a state machine on top of a pull parser, or use xmltree wrapped in some helper struct and methods... Infortunately I loose the automatic struct filling of serde – treywelsh Feb 09 '23 at 21:34
  • I think that will work. I've been meaning to learn serde better, so this has been an interesting puzzle. Looking at [deserialize_with](https://stackoverflow.com/a/54764617) makes me think it might also help. Also wondering if a rewriter like [LOLHTML](https://github.com/cloudflare/lol-html) could tweak the XML so that serde would have less to do, perhaps by moving the parts of the tag names that change into attributes. – jq170727 Feb 09 '23 at 22:40

1 Answers1

0

From deserialize map example of serde I tried this:

#![allow(dead_code)]

use serde::Deserialize;
use std::{collections::HashMap, fmt};

use serde::de::{self, Deserializer, MapAccess, Visitor};

const XML: &str = r#"
<vm>
    <id>15</id>
    <template>
        <tag1>content1</tag1>
        <tag2>content2</tag2>
        <vec1>
            <tag3>content3</tag3>
            <tag4>content4</tag4>
        </vec1>
        <vec1>
            <tag3>content5</tag3>
        </vec1>
        <tag2>content6</tag2>
        <vec2>
            <tag3>content7</tag3>
        </vec2>
    </template>
</vm>
"#;

#[derive(Debug, Clone)]
struct Pair(String, String);

#[derive(Debug, Clone)]

struct Vector(String, Vec<Pair>);

#[derive(Debug, Clone)]
struct Template {
    pairs: Vec<Pair>,
    vectors: Vec<Vector>,
}

impl Template {
    fn new() -> Self {
        Template {
            pairs: Vec::new(),
            vectors: Vec::new(),
        }
    }
}

#[derive(Debug, Deserialize)]
pub struct VM {
    id: i64,
    template: Template,
}

struct TemplateVisitor;

impl<'de> Visitor<'de> for TemplateVisitor {
    type Value = Template;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("a very special map")
    }

    fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
    where
        M: MapAccess<'de>,
    {
        let mut map = Template::new();

        while let Some(key) = access.next_key::<String>()? {
            let map_value = access.next_value::<HashMap<String, String>>().unwrap();

            if map_value.contains_key("$value") {
                map.pairs
                    .push(Pair(key, map_value.get("$value").unwrap().clone()));
            } else {

                let mut vector = Vec::new();
                for (k, v) in map_value {
                    vector.push(Pair(k, v))
                }
                map.vectors.push(Vector(key, vector));
            }
        }

        Ok(map)
    }
}

impl<'de> Deserialize<'de> for Template {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        deserializer.deserialize_map(TemplateVisitor {})
    }
}

fn main() {
    let obj: VM = serde_xml_rs::from_str(XML).unwrap();
    println!("{:#?}", obj);
}

Result:

VM {
    id: 15,
    template: Template {
        pairs: [
            Pair(
                "tag1",
                "content1",
            ),
            Pair(
                "tag2",
                "content2",
            ),
            Pair(
                "tag2",
                "content6",
            ),
        ],
        vectors: [
            Vector(
                "vec1",
                [
                    Pair(
                        "tag4",
                        "content4",
                    ),
                    Pair(
                        "tag3",
                        "content3",
                    ),
                ],
            ),
            Vector(
                "vec1",
                [
                    Pair(
                        "tag3",
                        "content5",
                    ),
                ],
            ),
            Vector(
                "vec2",
                [
                    Pair(
                        "tag3",
                        "content7",
                    ),
                ],
            ),
        ],
    },
}

However I'm not fully satisfied with the line map_value.contains_key("$value"). Not sure it's a clean way to do, maybe mixing with something like: either string or struct example to distinguish between a pair and a vector when deserializing

I'm still looking for a better solution

treywelsh
  • 13
  • 4
  • [An answer of the quick-xml maintainer](https://github.com/tafia/quick-xml/issues/526#issuecomment-1434706623) – treywelsh Feb 19 '23 at 19:33