0

So the idea is that I have this .json file that I need to read. It is so big that I can't even open it using notepad or Visual studio code.

I tried this:

BufferedReader in = new BufferedReader(new FileReader("path to the file"));
String line = in.readLine();

and I get this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3536) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:735) at java.base/java.lang.StringBuilder.append(StringBuilder.java:227) at java.base/java.io.BufferedReader.readLine(BufferedReader.java:372) at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:133) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.TagValues.listFilesForFolder(TagValues.java:129) at com.ReadJSON.Main.main(Main.java:18)

I searched on internet and some solutions were to change memory settings, but it doesn't work, it returns the same error. Another problem is that the entire file is ONELINE. The entire content of the file is written in a single line. I think I have to break the reading part of the line at a certain time so it doesn't get over the maximum allocated memory, store the value and start to read again from where I left. Doing this over and over until the end of the line.

Any suggestions of how should I read this file? Should I try a different way to read it or is there a trick to break the readLine()?

Thanks!

Alexandru DuDu
  • 998
  • 1
  • 7
  • 19
  • 2
    The solution of increase the memory, might be acceptable in small files, not in big files. For big files, you will need to use a Stream, and read the file chunk by chunk. – sgtcortez Dec 01 '20 at 12:34
  • 1
    Possible duplicate: https://stackoverflow.com/questions/2356137/read-large-files-in-java – sgtcortez Dec 01 '20 at 12:34

3 Answers3

3

For such a huge JSON file one should not read an entire JSON DOM, document object model, into memory. But use a stream parser.

BufferedReader with a readLine would be wrong anyway if there is only one huge line. Also JSON files are in general in UT-8 encoding. FileReader is an old utility class that uses the default character encoding: not portable code, wrong.

There is a Jackson Streaming API. For a project using maven:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.11.3</version>
</dependency>

The code would be something like:

JsonFactory factory = new JsonFactory();
try (JsonParser parser = jactory.createParser(...)) {
    while (parser.nextToken() != JsonToken.END_OBJECT) {
        String field = parser.getCurrentName();
        switch (field) {
        case "...":
            ...
            ... parser.getText();
            ... parser.getIntValue();
            break;
    }
}

For extracting a part of the data, or storing data in a database.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I've got an off-topic but a bit related question: does Jackson provide any API to read huge property names and huge string literals? – terrorrussia-keeps-killing Dec 01 '20 at 12:45
  • I never needed huge tokens, so you need to try yourself. There were some binary data stored as Base64, which were large. I would think there might be a Integer.MAX_VALUE limit for Strings. – Joop Eggen Dec 01 '20 at 13:06
1

Even though you can increase the JVM memory limit, it is needless and allocating a huge memory like 1GB to process a file overkill and resource intensive.

InputStream inFileReader = channelSFtp.get(path); // file reading from ssh.
byte[] localbuffer = new byte[2048];

int i = 0;
while (-1 != (i = inFileReader.read(buffer))) {
    //Deal with the current read 2KB file chunk here
}

inFileReader.close();

This way you can read it piece by piece.

Zendem
  • 490
  • 5
  • 8
  • Technically, this is a good advice for copying streams or pushing read byte buffers to a downstream, but this seems to be unrelated to OP's issue. The OP seems to get burned at the bottleneck of collecting the entire file in a single line, whereas the JSON stuff (according to the provided stacktrace) is most likely able to work with input streams and readers directly. – terrorrussia-keeps-killing Dec 01 '20 at 12:53
1

You can check DSM streaming library. You can process JSON document while parsing it. You define your mapping in yaml for for data you want to process. It process the JSON document based on the mapping file. DSM uses Jackson stream api.

You can check example in this question

JAVA - Best approach to parse huge (extra large) JSON file

mfe
  • 1,158
  • 10
  • 15