I'm trying to process a very huge list of items from a JSON file and store the resulting result list as another JSON file. But I cannot comprehend how I should do this without loading everything in memory (and the memory constraint is capital in this issue).
In plain Java, this would be like:
- load a buffer from the input file (a simple
[ {item1}, {item2} ]
list) and parse it (with a streaming parser) - for all items in buffer call the unitary processing service in a loop
- for all results add to the writer buffer for the output JSON file
- loop until done
- close JSON writer (and write end of list inside file).
Camel seems to have a streaming service for reading (actually I'm not even sure it works - the jsonpath seems to loads everything in memory too... but that another issue), but it seems I cannot merge the results without having everything in memory before the actual write.
My first attempt, with an aggregator associated to the split - in this case the result file is OK, but everything is clearly loaded into memory so OOM:
<route>
<from uri="file:/mytestdirectory/?fileName=input.json" />
<split streaming="true" aggregationStrategy="#class:com.MyListAggregator">
<jsonpath writeAsString="true">$[*]</jsonpath>
<unmarshal>
<json unmarshalType="com.something.InputItem"
namingStrategy="UPPER_CAMEL_CASE"></json>
</unmarshal>
<bean beanType="com.something.Processor"
method="doSomething" />
</split>
<marshal>
<json namingStrategy="LOWER_CAMEL_CASE" useList="true" />
</marshal>
<to uri="file:/temp/to.json"></to>
</route>
Another attempt, writing the file from inside the split loop. In this case the resulting file is not a valid JSON list - it is just all items one after each other.
<route>
<from uri="file:/mytestdirectory/?fileName=input.json" />
<split streaming="true">
<jsonpath writeAsString="true">$[*]</jsonpath>
<unmarshal>
<json unmarshalType="com.something.InputItem"
namingStrategy="UPPER_CAMEL_CASE"></json>
</unmarshal>
<bean beanType="com.something.Processor"
method="doSomething" />
<marshal>
<json namingStrategy="LOWER_CAMEL_CASE" useList="true" />
</marshal>
<to uri="file:file:/temp/to.json?fileExist=append"></to>
</split>
</route>
So how can I do what I want with Camel ? As this is a basic integration need for integration batch processing I'm sure there is something somewhere I'm missing as I'm very new to Camel...