0

I have a JSON document that looks like this: JSON

It's a collection of arrays - findings, assets, assetGroups, etc. I wrote a function that takes the filename and the requested array name, and returns an ArrayList<> of the array entries as strings (which I re-parse to JSON on the client side). This works great when the files are smaller, but this one file is over 1.6GB in size so I blow out memory if I try and instantiate it all as a JSONObject. I want to try Jackson or GSon streaming APIs, but I'm getting wrapped around the axle trying to mix streaming APIs with direct DOM access. Like, I want to stream the JSON until I reach the "assetGroups" node, then iterate over that array and return the List<> of its contents. Does that make sense? Any help?

PaulHoran
  • 59
  • 3
  • Is there a reason not to just read and discard everything until then? – chrylis -cautiouslyoptimistic- Jun 12 '22 at 02:48
  • https://stackoverflow.com/questions/54817985/how-to-parse-a-huge-json-file-without-loading-it-in-memory have you tried this ? – satyesht Jun 12 '22 at 03:12
  • I would prefer to don't mix streaming and "dom"-like approaches. Streaming means a sequence of events. This means a Finite State Machine; a simplified custom one in our case. You might find these useful https://stackoverflow.com/questions/64422516/how-to-update-json-value-using-java/64424791#64424791 https://stackoverflow.com/questions/61868825/split-a-large-json-file-into-smaller-json-files-using-java/61918904#61918904 – AnatolyG Jun 12 '22 at 08:52

1 Answers1

0

(The following is for Gson)

You would probably have to start with a JsonReader to parse the JSON document in a streaming way. You would then first use its beginObject() method to enter the top level JSON object. Afterwards in a loop guarded by a hasNext() check you would obtain the next member name with nextName() and compare it with the desired name. If the name is not the one you are interested in, you can call skipValue() to skip its value:

JsonReader jsonReader = new JsonReader(reader);
jsonReader.beginObject();

while (jsonReader.hasNext()) {
    String name = jsonReader.nextName();

    if (name.equals(desiredName)) {

        ... // extract the data; see next section of this answer
        
        // Alternatively you could also return here already, ignoring the remainder
        // of the JSON data
        
    } else {
        jsonReader.skipValue();
    }
}

jsonReader.endObject();

if (jsonReader.peek() != JsonToken.END_DOCUMENT) {
    throw new IllegalArgumentException("Trailing data after JSON value");
}

You might also want to add checks to verify that the desired member actually exists in the JSON document, and to verify that it only exists exactly once and not multiple times.

If you only want to write back a subset of the JSON document without performing any content modifications to it, then you don't need to parse it into DOM objects. Instead you could directly read from the JsonReader and write to a JsonWriter:

static void transferTo(JsonReader reader, JsonWriter writer) throws IOException {
    NumberHolder numberHolder = new NumberHolder();
    int nestingDepth = 0;

    while (true) {
        JsonToken token = reader.peek();
        switch (token) {
            case BEGIN_ARRAY:
                reader.beginArray();
                writer.beginArray();
                nestingDepth++;
                break;
            case END_ARRAY:
                reader.endArray();
                writer.endArray();

                nestingDepth--;
                if (nestingDepth <= 0) {
                    return;
                }

                break;
            case BEGIN_OBJECT:
                reader.beginObject();
                writer.beginObject();
                nestingDepth++;
                break;
            case NAME:
                writer.name(reader.nextName());
                break;
            case END_OBJECT:
                reader.endObject();
                writer.endObject();

                nestingDepth--;
                if (nestingDepth <= 0) {
                    return;
                }

                break;
            case BOOLEAN:
                writer.value(reader.nextBoolean());
                break;
            case NULL:
                reader.nextNull();
                writer.nullValue();
                break;
            case NUMBER:
                // Read the number as string
                String numberAsString = reader.nextString();

                // Slightly hacky workaround to preserve the original number value
                // without having to parse it (which could lead to precision loss)
                numberHolder.value = numberAsString;
                writer.value(numberHolder);

                break;
            case STRING:
                writer.value(reader.nextString());
                break;
            case END_DOCUMENT:
                throw new IllegalStateException("Unexpected end of document");
            default:
                throw new AssertionError("Unknown JSON token: " + token);
        }
    }
}

You would call this method at the location marked with "extract the data" in the first code snippet.

If instead you do need the relevant JSON document section as DOM, then you can first use Gson.getAdapter to obtain the adapter, for example for your List<...> or for JsonArray (the generic DOM class; here it is less likely that you risk any precision loss during conversion). And then you can use read(JsonReader) of that adapter at the location marked with "extract the data" in the first code snippet.
It would not recommend directly using Gson.fromJson(JsonReader, ...) because it changes the configuration of the given JsonReader; unfortunately this pitfall is not properly documented at the moment (Gson issue).

Note that both approaches do not preserve the original JSON formatting, but content-wise that should not make a difference.

Marcono1234
  • 5,856
  • 1
  • 25
  • 43
  • Thanks! This looks like what I'm after. Question: you said 'You would call this method at the location marked with "extract the data" in the first code snippet.' Can you show me what that call would look like? I'm not writing to a file or System.out - I'm just creating an ArrayList<> and each entry will be a String that is the JSON element from the parent array. – PaulHoran Jun 27 '22 at 17:59
  • @PaulHoran, do you really need this as `ArrayList` (or in general `List`), or just in some form which you can then serialize to a JSON string again and send back to the client (for example as Gson `JsonArray`)? In your question you mentioned "client"; is there an open connection (e.g. `OutputStream`) to the client to which you want to write the JSON data? Do you need to perform further manipulations to the extracted data before sending it to the client? – Marcono1234 Jun 27 '22 at 21:21
  • Thanks - it is a List. Here's the declaration: `List results = new ArrayList<>();` JSONResult is a local class that's just a single String. class JSONResult { public String datum; public JSONResult(String datum) { this.datum = datum; } } I'm building up each row of the JSON as a string and appending it to the results array. Then returning that as a Stream to the caller. – PaulHoran Jun 28 '22 at 22:18