1

I am using jackson-dataformat-xml (2.9) to parse an XML into JsonNode and then parse it to JSON (the XML is very dynamic so that is why I am using JsonNode instead of binding to a POJO. e.g 'elementName' and 'id' names may vary).

It happens that during the JSON parsing phase, one of the element keys is empty string ("").

XML:

<elementName>
      <id type="pid">abcdef123</id>
</elementName>

Parsing logic:

public Parser() {
        ObjectMapper jsonMapper = new ObjectMapper();
        XmlMapper xmlMapper = new XmlMapper(new XmlFactory(new WstxInputFactory()));
}

public InputStream parseXmlResponse(InputStream xmlStream) {
        InputStream stream = null;

        try {
            JsonNode node = xmlMapper.readTree(xmlStream);
            stream = new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
        } catch (IOException e) {
            e.printStackTrace();
        }

        return stream;
    }

Json:

Result:

{
   "elementName": {
     "id": {
        "type": "pid",
        "": "abcdef123"
     }
   },
}

Expected:

{
   "elementName": {
     "id": {
        "type": "pid",
        "value": "abcdef123"
     }
   },
}

My idea is to find whenever I have the empty key "" and replace it with "value". Either at XML de-serialization or during JSON serialization. I have tried to use default serializer, filter, but haven't got it working in a nice and concise way.

Suggestions are much appreciated.

Thank you for the help.

Possible Solution:

Based on @shoek suggestion I decided to write a custom serializer to avoid creating an intermediate object (ObjectNode) during the process.

edit: refactor based on the same solution proposed by @shoek.

public class CustomNode {
    private JsonNode jsonNode;

    public CustomNode(JsonNode jsonNode) {
        this.jsonNode = jsonNode;
    }

    public JsonNode getJsonNode() {
        return jsonNode;
    }
}

public class CustomObjectsResponseSerializer extends StdSerializer<CustomNode> {

    protected CustomObjectsResponseSerializer() {
        super(CustomNode.class);
    }

    @Override
    public void serialize(CustomNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
        convertObjectNode(node.getJsonNode(), jgen, provider);
    }

    private void convertObjectNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
        jgen.writeStartObject();
        for (Iterator<String> it = node.fieldNames(); it.hasNext(); ) {
            String childName = it.next();
            JsonNode childNode = node.get(childName);
            // XML parser returns an empty string as value name. Replacing it with "value"
            if (Objects.equals("", childName)) {
                childName = "value";
            }

            if (childNode instanceof ArrayNode) {
                jgen.writeFieldName(childName);
                convertArrayNode(childNode, jgen, provider);
            } else if (childNode instanceof ObjectNode) {
                jgen.writeFieldName(childName);
                convertObjectNode(childNode, jgen, provider);
            } else {
                provider.defaultSerializeField(childName, childNode, jgen);
            }
        }
        jgen.writeEndObject();

    }

    private void convertArrayNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
        jgen.writeStartArray();
        for (Iterator<JsonNode> it = node.elements(); it.hasNext(); ) {
            JsonNode childNode = it.next();

            if (childNode instanceof ArrayNode) {
                convertArrayNode(childNode, jgen, provider);
            } else if (childNode instanceof ObjectNode) {
                convertObjectNode(childNode, jgen, provider);
            } else {
                provider.defaultSerializeValue(childNode, jgen);
            }
        }
        jgen.writeEndArray();
    }
}
WolfRevo
  • 73
  • 2
  • 10
  • How about writing a custom filter to `serializeAllExcept` fields with empty name? https://stackoverflow.com/a/13792700/12656244 – shoek May 25 '20 at 23:03
  • What's the requirement - to convert xml to json? If yes, can you update question with sample json which you are expecting for given xml. – Smile May 26 '20 at 07:18
  • @shoek tried filter but it looks like it only removes fields. What I intend is to set the key name whenever empty. – WolfRevo May 26 '20 at 08:44
  • @WolfRevo I see. Did you tried `addBeanSerializerModifier `?https://stackoverflow.com/a/33053944/12656244 In your case, you would `return beanProperties.stream().map(prop -> Objects.equals("", prop.getName()) ? prop : prop.rename(new NameTransformer() { @Override ... };` in `changeProperties` – shoek May 26 '20 at 17:12
  • @shoek Thanks for pointing it out. I tried adding it to objectMapper but it did not work. I guess this is because I am not using a custom bean but JsonNode instead. – WolfRevo May 27 '20 at 13:00

4 Answers4

1

You also could simply post-process the JSON DOM, traverse to all objects, and rename the keys that are empty strings to "value".

Race condition: such a key may already exist, and must not be overwritten
(e.g. <id type="pid" value="existing">abcdef123</id>).

Usage:
(note: you should not silently suppress the exception and return null, but allow it to propagate so the caller can decide to catch and apply failover logic if required)

public InputStream parseXmlResponse(InputStream xmlStream) throws IOException {
    JsonNode node = xmlMapper.readTree(xmlStream);
    postprocess(node);
    return new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
}

Post-processing:

private void postprocess(JsonNode jsonNode) {

    if (jsonNode.isArray()) {
        ArrayNode array = (ArrayNode) jsonNode;
        Iterable<JsonNode> elements = () -> array.elements();

        // recursive post-processing
        for (JsonNode element : elements) {
            postprocess(element);
        }
    }
    if (jsonNode.isObject()) {
        ObjectNode object = (ObjectNode) jsonNode;
        Iterable<String> fieldNames = () -> object.fieldNames();

        // recursive post-processing
        for (String fieldName : fieldNames) {
            postprocess(object.get(fieldName));
        }
        // check if an attribute with empty string key exists, and rename it to 'value',
        // unless there already exists another non-null attribute named 'value' which
        // would be overwritten.
        JsonNode emptyKeyValue = object.get("");
        JsonNode existing = object.get("value");
        if (emptyKeyValue != null) {
            if (existing == null || existing.isNull()) {
                object.set("value", emptyKeyValue);
                object.remove("");
            } else {
                System.err.println("Skipping empty key value as a key named 'value' already exists.");
            }
        }
    }
}

Output: just as expected.

{
   "elementName": {
     "id": {
        "type": "pid",
        "value": "abcdef123"
     }
   },
}

EDIT: considerations on performance:

I did a test with a large XML file (enwikiquote-20200520-pages-articles-multistream.xml, en.wikiquote XML dump, 498.4 MB), 100 rounds, with following measured times (using deltas with System.nanoTime()):

  • average read time (File, SSD): 2870.96 ms
    (JsonNode node = xmlMapper.readTree(xmlStream);)
  • average postprocessing time: 0.04 ms
    (postprocess(node);)
  • average write time (memory): 0.31 ms
    (new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));)

That's a fraction of a millisecond for an object tree build from a ~500 MB file - so performance is excellent and no concern.

Peter Walser
  • 15,208
  • 4
  • 51
  • 78
  • hi @Peter Waiser. Thanks for the help. What I was trying to avoid at all costs was to revisit the whole JsonNode tree to rename the empty keys. Comparing your solution to the serializer one, it looks like the serializer saves us the cost of revisiting the json node tree (despite being slightly more complex in terms of code). What do you think? Thank you once again. – WolfRevo May 29 '20 at 13:35
  • Thanks for pointing out the "value" duplication scenario. Nice catch! – WolfRevo May 29 '20 at 14:31
  • have you actually measured what the overhead of traversing the JSON tree is? Even for large trees, it probably will only take neglectible fractions of a second. Measure and you may get surprised. – Peter Walser May 29 '20 at 21:42
  • I measured it for you (500 MB XML), see edit it the answer. – Peter Walser May 29 '20 at 22:08
  • 1
    Waiser thank you for the answer and the performance check. I appreciate shoek's answer as well but given yours do not tough touch the inners of the node parsing, it is clean, I see a smaller risk in going forward with your solution than overriding the parser logic. Cheers. – WolfRevo Jun 02 '20 at 14:05
1

I figured out that this behaviour can be achieved via configuration. Here is the kotlin code but it's simple to convert to java Just create xmlMapper with appropriate configuration

    fun jacksonCreateXmlMapper(): XmlMapper {
        val module = JacksonXmlModule()
        module.setXMLTextElementName("value")
        return XmlMapper(module)
    }

For input

<products>
    <product count="5">apple</product>
    <product count="10">orange</product>
</products>

you get:

{
  "product" : [ {
    "count" : "5",
    "value" : "apple"
  }, {
    "count" : "10",
    "value" : "orange"
  } ]
}
Alexander Kondaurov
  • 3,677
  • 5
  • 42
  • 64
0

Copying to a new ObjectNode may solve your problem.

package com.example;

import java.util.Iterator;
import java.util.Objects;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ArrayNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.fasterxml.jackson.databind.node.ValueNode;

public class Stackoverflow62009220 {
    public static void main(String[] args) throws JsonProcessingException {
        convert("{\"elementName\":{\"id\":{\"type\":\"pid\",\"\":\"abcdef123\"}}}");

        convert("{\"array\":[1,99,3]}");

        convert("{\"complex-array\":[null, 1, [3,7,5], {\"type\":\"pid\",\"\":\"abcdef123\"}]}");
    }

    private static void convert(String str) throws JsonProcessingException {
        JsonNode input = (new ObjectMapper()).readTree(str);
        System.out.println("in:");
        System.out.println(input);

        ObjectMapper mapper = new ObjectMapper();

        ObjectNode obj = convertObjectNode(input, mapper);

        String output = mapper.writer().writeValueAsString(obj);
        System.out.println("out:");
        System.out.println(output);
        System.out.println("----------");
    }

    private static ArrayNode convertArrayNode(JsonNode current, ObjectMapper mapper) {
        ArrayNode to = mapper.createArrayNode();
        for (Iterator<JsonNode> it = current.elements(); it.hasNext();) {
            JsonNode childNode = it.next();

            if (childNode instanceof ValueNode) {
                to.add(childNode);
            } else if (childNode instanceof ArrayNode) {
                // recurse
                to.add(convertArrayNode(childNode, mapper));
            } else if (childNode instanceof ObjectNode) {
                to.add(convertObjectNode(childNode, mapper));
            }
        }
        return to;
    }

    private static ObjectNode convertObjectNode(JsonNode current, ObjectMapper mapper) {
        ObjectNode to = mapper.createObjectNode();
        for (Iterator<String> it = current.fieldNames(); it.hasNext();) {
            String childName = it.next();
            JsonNode childNode = current.get(childName);

            if (Objects.equals("", childName)) {
                childName = "value";
            }

            if (childNode instanceof ValueNode) {
                to.set(childName, childNode);
            } else if (childNode instanceof ArrayNode) {
                to.set(childName, convertArrayNode(childNode, mapper));
            } else if (childNode instanceof ObjectNode) {
                // recurse
                to.set(childName, convertObjectNode(childNode, mapper));
            }
        }
        return to;
    }
}

The preceding code results in:

in:
{"elementName":{"id":{"type":"pid","":"abcdef123"}}}
out:
{"elementName":{"id":{"type":"pid","value":"abcdef123"}}}
----------
in:
{"array":[1,99,3]}
out:
{"array":[1,99,3]}
----------
in:
{"complex-array":[null,1,[3,7,5],{"type":"pid","":"abcdef123"}]}
out:
{"complex-array":[null,1,[3,7,5],{"type":"pid","value":"abcdef123"}]}
----------

P.S.

I couldn't find a way to use a custom serializer (like this) for non-typed JsonNode. If someone knows, please post your answer. It may be a better solution with regard to memory usage/processing time.

shoek
  • 380
  • 2
  • 9
  • Thanks for the help @shoek. I am wondering how better (in terms of processor and memory usage) this approach may be than a regular string.replace. String jsonResult = StringUtils.replace(jsonMapper.writeValueAsString(dataNode), "\"\"","\"value\""); Ideally I would like to hook a listener during the xml deserialization or json serialization to check if the field name is empty. This shouldn't be that hard... :( – WolfRevo May 28 '20 at 08:39
  • @WolfRevo In some cases, replacement gives you a wrong result. e.g. `{"empty-value" : "", "double-quote" : "\""}` will be transformed to a malformated json, `{"empty-value" : "value", "double-quote" : "\"value"}` – shoek May 28 '20 at 09:33
  • @WolfRevo Custom json serializer would be better: As I said in postscript, I couldn't make it. Sorry I couldn't help you. – shoek May 28 '20 at 09:51
  • No problem @shoek. Appreciate your time. Do you think that instead of creating another ObjectNode we could directly write the Json string using a custom serializer? I managed to get my own serializer (public class CustomSerializer extends StdSerializer) but couldn't get the recursion logic working yet: invoking jsonGenerator.write instead of setting ObjectNode fields add/set. Cheera – WolfRevo May 29 '20 at 09:59
  • @WolfRevo Luckily I managed to get it work, please find my `StdSerializer)` in second answer. Your concern, recursion, is implemented by `if (childNode instanceof ObjectNode) {gen.writeFieldName(childName);this.serialize((ObjectNode) childNode, gen, provider);}` – shoek May 29 '20 at 12:59
0

Serializer version.

package com.example;

import java.io.IOException;
import java.util.Iterator;
import java.util.Objects;

import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializerProvider;
import com.fasterxml.jackson.databind.module.SimpleModule;
import com.fasterxml.jackson.databind.module.SimpleSerializers;
import com.fasterxml.jackson.databind.node.ArrayNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.fasterxml.jackson.databind.ser.std.StdSerializer;

public class Stackoverflow62009220_B {
    public static void main(String[] args) throws JsonProcessingException {
        // see https://www.baeldung.com/jackson-call-default-serializer-from-custom-serializer

        convert("{\"elementName\":{\"id\":{\"type\":\"pid\",\"\":\"abcdef123\"}}}");

        // j =  {"":"is_empty_field","num":1,"str":"aa","null_val":null,"empty_val":"","array":[3,5],"obj":{"a":"A","b":22}}
        // (simple json object)
        String j = "{\"\":\"is_empty_field\",\"num\":1,\"str\":\"aa\",\"null_val\":null,\"empty_val\":\"\",\"array\":[3,5],\"obj\":{\"a\":\"A\",\"b\":22}}";
        convert(j);

        // g = {"":"is_empty_field","num":1,"str":"aa","null_val":null,"empty_val":"","array":[3,{"":"is_empty_field","num":1,"str":"aa","null_val":null,"empty_val":"","array":[3,5],"obj":{"a":"A","b":22}}],"obj":{"":"is_empty_field","num":1,"str":"aa","null_val":null,"empty_val":"","array":[3,5],"obj":{"a":"A","b":22}}}
        // (includes an array containing object j, and an object j containing array)
        String g = " {\"\":\"is_empty_field\",\"num\":1,\"str\":\"aa\",\"null_val\":null,\"empty_val\":\"\",\"array\":[3,{\"\":\"is_empty_field\",\"num\":1,\"str\":\"aa\",\"null_val\":null,\"empty_val\":\"\",\"array\":[3,5],\"obj\":{\"a\":\"A\",\"b\":22}}],\"obj\":{\"\":\"is_empty_field\",\"num\":1,\"str\":\"aa\",\"null_val\":null,\"empty_val\":\"\",\"array\":[3,5],\"obj\":{\"a\":\"A\",\"b\":22}}}";
        convert(g);
    }

    private static void convert(String str) throws JsonProcessingException {
        JsonNode input = (new ObjectMapper()).readTree(str);
        System.out.println("in:");
        System.out.println(input);

        ObjectMapper mapper = new ObjectMapper();
        SimpleModule module = new SimpleModule();
        SimpleSerializers serializers = new SimpleSerializers();
        serializers.addSerializer(ObjectNode.class, new MyObjectNodeSerializer());
        module.setSerializers(serializers);
        mapper.registerModule(module);

        String output = mapper.writer().writeValueAsString(input);
        System.out.println("out:");
        System.out.println(output);
        System.out.println("----------");
    }
}

class MyObjectNodeSerializer extends StdSerializer<ObjectNode> {

    public MyObjectNodeSerializer() {
        super(ObjectNode.class);
    }

    public static MyObjectNodeSerializer create() {
        return new MyObjectNodeSerializer();
    }

    @Override
    public void serialize(ObjectNode value, JsonGenerator gen, SerializerProvider provider) throws IOException {
        gen.writeStartObject();
        for (Iterator<String> it = value.fieldNames(); it.hasNext();) {
            String childName = it.next();
            JsonNode childNode = value.get(childName);

            if (Objects.equals("", childName)) {
                childName = "value";
            }

            if (childNode instanceof ArrayNode) {
                gen.writeFieldName(childName);
                MyArrayNodeSerializer.create().serialize((ArrayNode) childNode, gen, provider);
            } else if (childNode instanceof ObjectNode) {
                gen.writeFieldName(childName);
                this.serialize((ObjectNode) childNode, gen, provider);
            } else {
                provider.defaultSerializeField(childName, childNode, gen);
            }
        }
        gen.writeEndObject();
    }
}

class MyArrayNodeSerializer extends StdSerializer<ArrayNode> {

    public MyArrayNodeSerializer() {
        super(ArrayNode.class);
    }

    public static MyArrayNodeSerializer create() {
        return new MyArrayNodeSerializer();
    }

    @Override
    public void serialize(ArrayNode value, JsonGenerator gen, SerializerProvider provider) throws IOException {
        gen.writeStartArray();
        for (Iterator<JsonNode> it = value.elements(); it.hasNext();) {
            JsonNode childNode = it.next();
            if (childNode instanceof ArrayNode) {
                this.serialize((ArrayNode) childNode, gen, provider);
            } else if (childNode instanceof ObjectNode) {
                MyObjectNodeSerializer.create().serialize((ObjectNode) childNode, gen, provider);
            } else {
                provider.defaultSerializeValue(childNode, gen);
            }
        }
        gen.writeEndArray();
    }
}
shoek
  • 380
  • 2
  • 9
  • hi @shoek. Looks like we both were working on the Serializer version at the same time...kkkk. Liked the idea of having a separate ArrayNode serializer. One question: on this line ( MyArrayNodeSerializer.create()) am I right in say that we will create one instance every time we hit this line? Wouldn't be better to create the MyArrayNodeSerializer in the MyObjectNodeSerializer constructor? Thank you so much pal. – WolfRevo May 29 '20 at 13:15
  • @WolfRevo Yes, it creates as many `MyObjectNodeSerializer` and `MyArrayNodeSerializer`s as arrays and objects in the json. You can optimize the code, just try. It is better to instantiate each type of serializer just once and share them in the execution, because, AFAIK, (library native) serializers are thread-safe. https://stackoverflow.com/a/20956725/12656244 But note that, when we have `new MyArrayNodeSerializer();` in MyObjectNodeSerializer constructor and vice versa, they call each other indefinitely and fails by StackOverflowError. Instantiate in somewhere else. – shoek May 30 '20 at 00:17