Thanks to Abhijit Sarkar's answer for leading the way.
I needed to download a heavy JSON stream and break it into small streamable manageable pieces of data.
The JSON is composed of objects that have big properties: such big properties can be serialized to a file, and thus removed from the unmarshalled JSON object.
Another use case is to download a JSON stream object by object, process it like a map/reduce algorythm and produce a single output without having to load the whole stream in memory.
Yet another use case is to read a big JSON file and only pick a few objects based on a condition, while unmarshalling to Plain Old Java Objects.
Here is an example: we'd like to stream a very huge JSON file that is an array, and we'd like to retrieve only the first object in the array.
Given this big file on a server, available at http://example.org/testings.json :
[
{ "property1": "value1", "property2": "value2", "property3": "value3" },
{ "property1": "value1", "property2": "value2", "property3": "value3" },
... 1446481 objects => a file of 104 MB => take quite long to download...
]
Each row of this JSON array can be parsed as this object:
@lombok.Data
public class Testing {
String property1;
String property2;
String property3;
}
You need this class make the parsing code reusable:
import com.fasterxml.jackson.core.JsonParser;
import java.io.IOException;
@FunctionalInterface
public interface JsonStreamer<R> {
/**
* Parse the given JSON stream, process it, and optionally return an object.<br>
* The returned object can represent a downsized parsed version of the stream, or the result of a map/reduce processing, or null...
*
* @param jsonParser the parser to use while streaming JSON for processing
* @return the optional result of the process (can be {@link Void} if processing returns nothing)
* @throws IOException on streaming problem (you are also strongly encouraged to throw HttpMessageNotReadableException on parsing error)
*/
R stream(JsonParser jsonParser) throws IOException;
}
And this class to parse:
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import lombok.AllArgsConstructor;
import org.springframework.http.HttpInputMessage;
import org.springframework.http.HttpOutputMessage;
import org.springframework.http.MediaType;
import org.springframework.http.converter.HttpMessageConverter;
import java.io.IOException;
import java.util.Collections;
import java.util.List;
@AllArgsConstructor
public class StreamingHttpMessageConverter<R> implements HttpMessageConverter<R> {
private final JsonFactory factory;
private final JsonStreamer<R> jsonStreamer;
@Override
public boolean canRead(Class<?> clazz, MediaType mediaType) {
return MediaType.APPLICATION_JSON.isCompatibleWith(mediaType);
}
@Override
public boolean canWrite(Class<?> clazz, MediaType mediaType) {
return false; // We only support reading from an InputStream
}
@Override
public List<MediaType> getSupportedMediaTypes() {
return Collections.singletonList(MediaType.APPLICATION_JSON);
}
@Override
public R read(Class<? extends R> clazz, HttpInputMessage inputMessage) throws IOException {
try (InputStream inputStream = inputMessage.getBody();
JsonParser parser = factory.createParser(inputStream)) {
return jsonStreamer.stream(parser);
}
}
@Override
public void write(R result, MediaType contentType, HttpOutputMessage outputMessage) {
throw new UnsupportedOperationException();
}
}
Then, here is the code to use to stream the HTTP response, parse the JSON array and return only the first unmarshalled object:
// You should @Autowire these:
JsonFactory jsonFactory = new JsonFactory();
ObjectMapper objectMapper = new ObjectMapper();
RestTemplateBuilder restTemplateBuilder = new RestTemplateBuilder();
// If detectRequestFactory true (default): HttpComponentsClientHttpRequestFactory will be used and it will consume the entire HTTP response, even if we close the stream early
// If detectRequestFactory false: SimpleClientHttpRequestFactory will be used and it will close the connection as soon as we ask it to
RestTemplate restTemplate = restTemplateBuilder.detectRequestFactory(false).messageConverters(
new StreamingHttpMessageConverter<>(jsonFactory, jsonParser -> {
// While you use a low-level JsonParser to not load everything in memory at once,
// you can still profit from smaller object mapping with the ObjectMapper
if (!jsonParser.isClosed() && jsonParser.nextToken() == JsonToken.START_ARRAY) {
if (!jsonParser.isClosed() && jsonParser.nextToken() == JsonToken.START_OBJECT) {
return objectMapper.readValue(jsonParser, Testing.class);
}
}
return null;
})
).build();
final Testing firstTesting = restTemplate.getForObject("http://example.org/testings.json", Testing.class);
log.debug("First testing object: {}", firstTesting);