5

I have a big json file, about ~40Gb in size. When I try to convert this file of array of objects to a list of java objects, it crashes. I've used all sizes of maximum heap xmx but nothing has worked!

public Set<Interlocutor> readJsonInterlocutorsToPersist() {
    String userHome = System.getProperty(USER_HOME);
    log.debug("Read file interlocutors "+userHome);
    try {
        ObjectMapper mapper = new ObjectMapper();
        // JSON file to Java object
        Set<Interlocutor> interlocutorDeEntities = mapper.readValue(
                new File(userHome + INTERLOCUTORS_TO_PERSIST),
                new TypeReference<Set<Interlocutor>>() {
                });
        return interlocutorDeEntities;
    } catch (Exception e) {
        log.error("Exception while Reading InterlocutorsToPersist file.",
                e.getMessage());
        return null;
    }
} 

Is there a way to read this file using BufferedReader and then to push object by object?

Ivar
  • 6,138
  • 12
  • 49
  • 61
Mirlo
  • 625
  • 9
  • 26

2 Answers2

4

You should definitly have a look at the Jackson Streaming API (https://www.baeldung.com/jackson-streaming-api). I used it myself for GB large JSON files. The great thing is you can divide your JSON into several smaller JSON objects and then parse them with mapper.readTree(parser). That way you can combine the convenience of normal Jackson with the speed and scalability of the Streaming API.

Related to your problem:

I understood that your have a really large array (which is the reason for the file size) and some much more readable objects:

e.g.:

[ // 40GB
{}, // Only 400 MB
{},
]

What you can do now is to parse the file with Jackson's Streaming API and go through the array. But each individual object can be parsed as "regular" Jackson object and then processed easily.

You may have a look at this Use Jackson To Stream Parse an Array of Json Objects which actually matches your problem pretty well.

Ayk Borstelmann
  • 316
  • 2
  • 8
  • Your solution works also, but my object have many dependencies ( object inside others ) that's why i need one way to read and convert to object. thanks – Mirlo Jul 01 '20 at 11:51
  • 1
    Well with this solution you could also read all objects into a set. It is actually the same solution as you've found, but instead of using `Gson` it would use `Jackson`. – Ayk Borstelmann Jul 01 '20 at 13:16
2

is there a way to read this file using BufferedReader and then to push object by object ?

Of course, not. Even you can open this file how you can store 40GB as java objects in memory? I think you don't have such amount of memory in you computers (but technically using ObjectMapper you should have about 2 times more operation memory - 40GB for store json + 40GB for store results as java objects = 80 GB).

I think you should use any way from this questions, but store information in databases or files instead of memory. For example, if you have millions rows in json, you should parse and save every rows to database without keeping it all in memory. And then you can get this data from database step by step (for example, not more then 1GB for every time).

aeronaut
  • 53
  • 1
  • 7
Slava Vedenin
  • 58,326
  • 13
  • 40
  • 59
  • Well it is possible in theory, as SAX (for XML) proves. Of course you can't have the entire document in memory at once, but you could read parts of the structure, write them into a database / into smaller documents for individual objects, drop them from memory, and repeat. I do not know any implementation that does this, though. –  Jul 01 '20 at 10:39