20

I have a file that contains a json array of objects:

[ { "test1": "abc" }, { "test2": [1, 2, 3] } ]

I wish to use use Jackson's JsonParser to take an inputstream from this file, and at every call to .next(), I want it to return an object from the array until it runs out of objects or fails.

Is this possible?

Use case: I have a large file with a json array filled with a large number of objects with varying schemas. I want to get one object at a time to avoid loading everything into memory.

EDIT:

I completely forgot to mention. My input is a string that is added to over time. It slowly accumulates json over time. I was hoping to be able to parse it object by object removing the parsed object from the string.

But I suppose that doesn't matter! I can do this manually so long as the jsonParser will return the index into the string.

Programmer9000
  • 1,959
  • 3
  • 17
  • 27

4 Answers4

53

Yes, you can achieve this sort of part-streaming-part-tree-model processing style using an ObjectMapper:

ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser(new File(...));
if(parser.nextToken() != JsonToken.START_ARRAY) {
  throw new IllegalStateException("Expected an array");
}
while(parser.nextToken() == JsonToken.START_OBJECT) {
  // read everything from this START_OBJECT to the matching END_OBJECT
  // and return it as a tree model ObjectNode
  ObjectNode node = mapper.readTree(parser);

  // do whatever you need to do with this object
}

parser.close();
Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • 2
    Hey Ian, after nearly 2 years this code actually still works and saved my day. Just trying to confirm, every time mapper does the read tree until the END_OBJECT token matches where it started, the "Cursor" of parser is also moved there right? so if I do another `parser.nextToken()` right after the while loop, I should be reading the next object after what's just being read, correct? – James Jiang Jul 21 '16 at 23:43
  • 3
    @JamesJiang correct. The `readTree` given a parser positioned on the `START_OBJECT` will consume events from the parser until it reaches the matching `END_OBJECT` and will leave the parser positioned on that. – Ian Roberts Aug 26 '16 at 15:02
  • Hmm, I tried this, but I'm getting a `com.fasterxml.jackson.core.JsonParseException: Unexpected character (',' (code 44)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')` exception. But a JSON list is supposed to be separated by commas, so I'm not sure why Jackson is complaining... any tips? – SnoopDougg Oct 17 '18 at 22:05
  • 1
    @SnoopDougg that looks to me like an error with the JSON, maybe a property with no value (`{"foo": ,"bar":"baz"}`). Or possibly if you've got a series of objects at the top level that are separated by commas but _not_ surrounded by overall array brackets (`[]`) - I know this technique can cope with either a well formed array (`[{...},{...}]`) or a stream of objects (`{...}{...}`) but not if you have the commas without the brackets. – Ian Roberts Oct 19 '18 at 12:30
  • I was trying to separate the objects in my stream with commas. Removing the commas fixed it, thanks! Now I just pass (`{...}{...}{...}`) – SnoopDougg Oct 25 '18 at 18:03
  • Hi, thank you for this helpful discussion. Can you please explain: I use com.fasterxml.jackson.databind.ObjectMapper for parsing quite large JSON (~ 100 Mb) which is set of several attributes, each of them in turn is an array of specific objects. And I use a wrapper for one attribute of interest. The wrapper has a single property - List of objects mapped to DO. Then in a for-loop I parse (and save to db) each node. I'm looking for a way to speed up parsing-saving process. Will the approach above (reading nodes one-by-one) improve the parsing speed? – Andrey M. Stepanov Feb 18 '19 at 13:12
  • @AndreyM.Stepanov I suggest you ask your own question (possibly linking back to this one) as comment threads aren't really suited to this depth of discussion – Ian Roberts Mar 06 '19 at 15:31
18

What you are looking for is called Jackson Streaming API. Here is a code snippet using Jackson Streaming API that could help you to achieve what you need.

JsonFactory factory = new JsonFactory();
JsonParser parser = factory.createJsonParser(new File(yourPathToFile));

JsonToken token = parser.nextToken();
if (token == null) {
    // return or throw exception
}

// the first token is supposed to be the start of array '['
if (!JsonToken.START_ARRAY.equals(token)) {
    // return or throw exception
}

// iterate through the content of the array
while (true) {

    token = parser.nextToken();
    if (!JsonToken.START_OBJECT.equals(token)) {
        break;
    }
    if (token == null) {
        break;
    }

    // parse your objects by means of parser.getXxxValue() and/or other parser's methods

}
pgiecek
  • 7,970
  • 4
  • 39
  • 47
  • Just for information, now the method createJsonParser is deprecated you can use createParser instead of createJsonParser. – anthony Nov 15 '18 at 11:23
5

This example reads custom objects directly from a stream:

source is a java.io.File

ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser( source );
if ( parser.nextToken() != JsonToken.START_ARRAY ) {
    throw new Exception( "no array" );
}
while ( parser.nextToken() == JsonToken.START_OBJECT ) {
    CustomObj custom = mapper.readValue( parser, CustomObj.class );
    System.out.println( "" + custom );
}
stacker
  • 68,052
  • 28
  • 140
  • 210
4

This is a late answer that builds on Ian Roberts' answer. You can also use a JsonPointer to find the start position if it is nested into a document. This avoids custom coding the slightly cumbersome streaming token approach to get to the start point. In this case, the basePath is "/", but it can be any path that JsonPointer understands.

Path sourceFile = Paths.get("/path/to/my/file.json");
// Point the basePath to a starting point in the file
JsonPointer basePath = JsonPointer.compile("/");
ObjectMapper mapper = new ObjectMapper();
try (InputStream inputSource = Files.newInputStream(sourceFile);
     JsonParser baseParser = mapper.getFactory().createParser(inputSource);
     JsonParser filteredParser = new FilteringParserDelegate(baseParser,
                    new JsonPointerBasedFilter(basePath), false, false);) {
    // Call nextToken once to initialize the filteredParser
    JsonToken basePathToken = filteredParser.nextToken();
    if (basePathToken != JsonToken.START_ARRAY) {
        throw new IllegalStateException("Base path did not point to an array: found " 
                                       + basePathToken);
    }
    while (filteredParser.nextToken() == JsonToken.START_OBJECT) {
        // Parse each object inside of the array into a separate tree model 
        // to keep a fixed memory footprint when parsing files 
        // larger than the available memory
        JsonNode nextNode = mapper.readTree(filteredParser);
        // Consume/process the node for example:
        JsonPointer fieldRelativePath = JsonPointer.compile("/test1");
        JsonNode valueNode = nextNode.at(fieldRelativePath);
        if (!valueNode.isValueNode()) {
            throw new IllegalStateException("Did not find value at "
                    + fieldRelativePath.toString() 
                    + " after setting base to " + basePath.toString());
        }
        System.out.println(valueNode.asText());
    }
}
Peter
  • 972
  • 7
  • 13