0

I'm trying to process a very huge list of items from a JSON file and store the resulting result list as another JSON file. But I cannot comprehend how I should do this without loading everything in memory (and the memory constraint is capital in this issue).

In plain Java, this would be like:

  • load a buffer from the input file (a simple [ {item1}, {item2} ]list) and parse it (with a streaming parser)
  • for all items in buffer call the unitary processing service in a loop
  • for all results add to the writer buffer for the output JSON file
  • loop until done
  • close JSON writer (and write end of list inside file).

Camel seems to have a streaming service for reading (actually I'm not even sure it works - the jsonpath seems to loads everything in memory too... but that another issue), but it seems I cannot merge the results without having everything in memory before the actual write.

My first attempt, with an aggregator associated to the split - in this case the result file is OK, but everything is clearly loaded into memory so OOM:

    <route>
        <from uri="file:/mytestdirectory/?fileName=input.json" />
        <split streaming="true" aggregationStrategy="#class:com.MyListAggregator">
            <jsonpath writeAsString="true">$[*]</jsonpath>

            <unmarshal>
                <json unmarshalType="com.something.InputItem"
                    namingStrategy="UPPER_CAMEL_CASE"></json>
            </unmarshal>

            <bean beanType="com.something.Processor"
                method="doSomething" />
        </split>

        <marshal>
            <json namingStrategy="LOWER_CAMEL_CASE" useList="true" />
        </marshal>
        <to uri="file:/temp/to.json"></to>
    </route>

Another attempt, writing the file from inside the split loop. In this case the resulting file is not a valid JSON list - it is just all items one after each other.

    <route>
        <from uri="file:/mytestdirectory/?fileName=input.json" />
        <split streaming="true">
            <jsonpath writeAsString="true">$[*]</jsonpath>

            <unmarshal>
                <json unmarshalType="com.something.InputItem"
                    namingStrategy="UPPER_CAMEL_CASE"></json>
            </unmarshal>

            <bean beanType="com.something.Processor"
                method="doSomething" />

            <marshal>
                <json namingStrategy="LOWER_CAMEL_CASE" useList="true" />
            </marshal>

            <to uri="file:file:/temp/to.json?fileExist=append"></to>
        </split>
    </route>

So how can I do what I want with Camel ? As this is a basic integration need for integration batch processing I'm sure there is something somewhere I'm missing as I'm very new to Camel...

Marcanpilami
  • 584
  • 3
  • 14
  • This may be related: [jsonpath to split messages](https://stackoverflow.com/a/54248063) – dank8 Feb 20 '23 at 13:01
  • @dank8 thanks but that answer is about splitting the input file, not merging the results inside one file as in my issue. – Marcanpilami Feb 20 '23 at 16:46
  • I'd suggest to take a look at https://stackoverflow.com/questions/67501221/modify-and-re-write-key-from-json-file/67502489#67502489 https://stackoverflow.com/questions/66678062/apply-a-mask-on-a-json-to-keep-only-mandatory-data/66708567#66708567 A small change is required to read more than one file and to tune up their modification/transformation if required – AnatolyG Feb 22 '23 at 17:51
  • 1
    jsonpath does not support streaming in their library - there is a github issue about it, but the project is not very active. – Claus Ibsen Mar 25 '23 at 21:54

0 Answers0