8

I want to parse a String into an internal JSON object (or equivalent) in Java. The usual libraries, Gson and Jackson, are way too slow for my needs (> 100us for each String to Json parse, according to my benchmarks). I know there are slightly faster libraries, but looking at the benchmarks online, the gains available will be small (less than an order of magnitude improvement).

If I know the format of the JSON in advance, is there a way I can parse it much faster? For example, I know the String will be a JSON of the format:

{
   "A" : 1.0 ,
   "B" : "X"
}

i.e., I know the two keys will be "A" and "B", and the values will be a double and a string, respectively. Given this advanced knowledge of the format, is there a library or some approach to parse the JSON much faster than usual?

ABC
  • 693
  • 1
  • 10
  • 22
  • Fasted way: let it parse by someone else! – fantaghirocco Oct 09 '19 at 12:08
  • 2
    The major performance issue is likely with the reflection needed by Jackson's `ObjectMapper` to dynamically determine and map the data. You might try [Jackson's Streaming parser](https://github.com/FasterXML/jackson-docs/wiki/JacksonStreamingApi) and map to your POJO class statically, i.e. in your own code. – Andreas Oct 09 '19 at 12:10
  • Maybe https://stackoverflow.com/q/2591098/85421 – user85421 Oct 09 '19 at 12:11
  • @Andreas Yes, profiling reveals `ObjectMapper` to be what's so slow. Can you provide an example with what you mean in your second sentence? – ABC Oct 09 '19 at 12:11
  • @ABC Added link in previous comment. – Andreas Oct 09 '19 at 12:12
  • "The usual libraries, Gson and Jackson, are way too slow for my needs": the writers of those libraries are likely to have put some effort into making them performant. As JSON is typically used for HTTP requests and responses, the parsing time is likely to be tiny compared to the I/O overhead. – Raedwald Oct 09 '19 at 12:12
  • @Raedwald I agree, they are probably fast within the constraint of being able to handle arbitrary data types, numbers of keys, etc. But I assume if I have advanced knowledge of what the keys are, what the data types are, I should be able to get something faster. – ABC Oct 09 '19 at 12:13
  • 1
    The fastest way to do something... is not to do it at all :-). If you really need high performance I/O, use a compact binary representation that is very easy to ingest without the cost of parsing text. – Raedwald Oct 09 '19 at 12:13
  • 1
    @Raedwald Unfortunately I have no choice, data that I must consume from an external source (in real time) is given to me as a plain-text JSON. – ABC Oct 09 '19 at 12:14
  • Of course you can go down to the `BufferedReader` or even `InputStream` level. But things like Jackson are complex beasts, for a reason: If you do *not* want to use them, you'll have to make absolutely and unambiguously clear: Tabs or spaces? Spaces before and after the `:`? The `{` braces in the same lines or the next? Etc. – Marco13 Oct 09 '19 at 12:24
  • 2
    I would suggest that you make absolutely sure to reuse `ObjectMapper` and measure steady state performance: with such tiny payload both Jackson and GSON should be able to decode and bind 10-100x faster than what you see. No need to do binary -- that will only get you up to 50% faster. For Jackson, can also use `jackson-module-afterburner` (https://github.com/FasterXML/jackson-modules-base/tree/master/afterburner) which can boost performance by further 30-40% – StaxMan Oct 11 '19 at 18:27
  • @StaxMan How do I reuse ObjectMapper? – ABC Oct 11 '19 at 18:29
  • 1
    @ABC just construct a single instance as static singleton, use that. Do NOT create new one for each operation. Reason for this is that all annotation-scanning, set up work is done just once per type; reusing mapper you avoid doing it after the very first time. – StaxMan Oct 11 '19 at 18:31

2 Answers2

17

If you know a JSON payload structure you can use Streaming API to read data. I created 4 different methods to read given JSON payload:

  1. Default Gson - use Gson class.
  2. Gson Adapter - use JsonReader from Gson library.
  3. Default Jackson - use ObjectMapper from Jackson.
  4. Jackson streaming API - use JsonParser class.

To make it comparable all these methods take JSON payload as String and return Pojo object which represents A and B properties. Below graph represents differences: enter image description here

As you can notice, Jackson's Streaming API is the fastest way to deserialise your JSON payload from these 4 approaches.

To generate above graph, below data were used:

1113 547 540 546 544 552 547 549 547 548 avg 603.3
940 455 452 456 465 459 457 458 455 455 avg 505.2
422 266 257 262 260 267 259 262 257 259 avg 277.1
202 186 184 189 185 188 182 186 187 183 avg 187.2

Benchmark code:

import com.fasterxml.jackson.annotation.JsonAutoDetect;
import com.fasterxml.jackson.annotation.PropertyAccessor;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;
import com.google.gson.TypeAdapter;
import com.google.gson.stream.JsonReader;
import com.google.gson.stream.JsonWriter;

import java.io.IOException;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.IntStream;

public class JsonApp {

    private static final String json = "{\"A\" : 1.0 ,\"B\" : \"X\"}";

    private static final int MAX = 1_000_000;

    private static List<List<Duration>> values = new ArrayList<>();

    static {
        IntStream.range(0, 4).forEach(i -> values.add(new ArrayList<>()));
    }

    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 10; i++) {
            int v = 0;
            values.get(v++).add(defaultGson());
            values.get(v++).add(gsonAdapter());
            values.get(v++).add(defaultJackson());
            values.get(v).add(jacksonJsonFactory());
        }
        values.forEach(list -> {
            list.forEach(d -> System.out.print(d.toMillis() + " "));
            System.out.println(" avg " + list.stream()
                    .mapToLong(Duration::toMillis)
                    .average().getAsDouble());
        });
    }

    static Duration defaultGson() {
        Gson gson = new Gson();

        long start = System.nanoTime();
        for (int i = MAX; i > 0; i--) {
            gson.fromJson(json, Pojo.class);
        }

        return Duration.ofNanos(System.nanoTime() - start);
    }

    static Duration gsonAdapter() throws IOException {
        PojoTypeAdapter adapter = new PojoTypeAdapter();

        long start = System.nanoTime();
        for (int i = MAX; i > 0; i--) {
            adapter.fromJson(json);
        }

        return Duration.ofNanos(System.nanoTime() - start);
    }

    static Duration defaultJackson() throws IOException {
        ObjectMapper mapper = new ObjectMapper();
        mapper.setVisibility(PropertyAccessor.FIELD, JsonAutoDetect.Visibility.ANY);

        long start = System.nanoTime();
        for (int i = MAX; i > 0; i--) {
            mapper.readValue(json, Pojo.class);
        }

        return Duration.ofNanos(System.nanoTime() - start);
    }

    static Duration jacksonJsonFactory() throws IOException {
        JsonFactory jfactory = new JsonFactory();

        long start = System.nanoTime();
        for (int i = MAX; i > 0; i--) {
            readPartially(jfactory);
        }
        return Duration.ofNanos(System.nanoTime() - start);
    }

    static Pojo readPartially(JsonFactory jfactory) throws IOException {
        try (JsonParser parser = jfactory.createParser(json)) {

            Pojo pojo = new Pojo();

            parser.nextToken(); // skip START_OBJECT - {
            parser.nextToken(); // skip A name
            parser.nextToken();
            pojo.A = parser.getDoubleValue();
            parser.nextToken(); // skip B name
            parser.nextToken();
            pojo.B = parser.getValueAsString();

            return pojo;
        }
    }
}

class PojoTypeAdapter extends TypeAdapter<Pojo> {

    @Override
    public void write(JsonWriter out, Pojo value) {
        throw new IllegalStateException("Implement me!");
    }

    @Override
    public Pojo read(JsonReader in) throws IOException {
        if (in.peek() == com.google.gson.stream.JsonToken.NULL) {
            in.nextNull();
            return null;
        }

        Pojo pojo = new Pojo();

        in.beginObject();
        in.nextName();
        pojo.A = in.nextDouble();
        in.nextName();
        pojo.B = in.nextString();

        return pojo;
    }
}

class Pojo {

    double A;
    String B;

    @Override
    public String toString() {
        return "Pojo{" +
                "A=" + A +
                ", B='" + B + '\'' +
                '}';
    }
}

Note: if you need really precise data try to create benchmark tests using excellent JMH package.

Michał Ziober
  • 37,175
  • 18
  • 99
  • 146
  • 3
    This is excellent. Thanks for the effort you put into this answer. – ABC Oct 10 '19 at 03:50
  • 1
    I would second recommendation of using JMH since there are many things that can twist results -- in this case number of repetitions seems bit low to get to steady state, for example, and all runs are in same JVM. On plus side it should be very easy to add JVM via annotations and just use code above. – StaxMan Oct 11 '19 at 18:29
  • @StaxMan, thanks for a comment. I just wanted to show a difference between 4 approaches and that `Streaming API` is more "stable" since first iteration than other ways. Of course, this test is not complete since only two libs and two ways are tested. But from other side, this test is easy to run and everyone should be able to test how it works on his computer with different `JVM`. I know, it is not perfect and precise as it could but I wanted to help somehow to make a good decision and which approach should be chosen. Hope, it will not be misleading for anyone. – Michał Ziober Oct 11 '19 at 19:26
  • 1
    @MichałZiober totally, and I noticed (after starting to write a comment) that you have mentioned jmh. Graph does look solid so I assume timings are probably not too far. I don't think it should be misleading. Stability makes sense, too, as there's much less code for JVM to optimize. – StaxMan Oct 11 '19 at 21:32
-1

You can try BSON. BSON is a binary object and runs faster than most JSON libraries

 //import java.util.ArrayList;
 //import org.bson.Document;


 Document root = Document.parse("{ \"A\" : 1.0, \"B\" : \"X\" }");

 System.out.println((root.get("A")));
 System.out.println(((String)root.get("B")));
Mateo
  • 10
  • 1