4

I have 100 millions of records in JSON file, need an efficient and fastest method to read the array of arrays from a JSON file in java.

JSON file look like:

[["XYZ",...,"ABC"],["XYZ",...,"ABC"],["XYZ",...,"ABC"],...,["XYZ",...,"ABC"],
 ["XYZ",...,"ABC"],["XYZ",...,"ABC"],["XYZ",...,"ABC"],...,["XYZ",...,"ABC"],
 ...
 ...
 ...
 ,["XYZ",...,"ABC"],["XYZ",...,"ABC"],["XYZ",...,"ABC"]]

I want to read this JSON file line by line as:

read first:

["XYZ",...,"ABC"]

then:

["XYZ",...,"ABC"]

so on:'

...
...
...
["XYZ",...,"ABC"]

How do I read a JSON file like this, I know it does not completely look like a JSON file but I need to read this file in this format which is saved as.JSON

Ali Azim
  • 160
  • 1
  • 15
  • 1
    Do you mean JSON?? Or am I missing something? – DazstaV3 Apr 10 '17 at 16:29
  • typing mistake sorry JSON. –  Apr 10 '17 at 16:31
  • First, the file should be on a fast SSD. Next you could try reading it with `BufferedReader` and see if that already gives you the maximum speed your SSD is able to deliver. If not try `FileChannel.map` and see how fast you can read from the resulting ByteBuffer. Btw. If you want to read the file more than once, make sure you have enough free RAM in your machine to allow the OS to buffer the whole file in memory. – Tesseract Apr 10 '17 at 16:35

2 Answers2

5

You can use JSON Processing API (JSR 353), to process your data in a streaming fashion:

import javax.json.Json;
import javax.json.stream.JsonParser;

...

String dataPath = "data.json";

try(JsonParser parser = Json.createParser(new FileReader(dataPath))) {
     List<String> row = new ArrayList<>();

     while(parser.hasNext()) {
         JsonParser.Event event = parser.next();
         switch(event) {
             case START_ARRAY:
                 continue;
             case VALUE_STRING:
                 row.add(parser.getString());
                 break;
             case END_ARRAY:
                 if(!row.isEmpty()) {
                     //Do something with the current row of data 
                     System.out.println(row);

                     //Reset it (prepare for the new row) 
                     row.clear();
                 }
                 break;
             default:
                 throw new IllegalStateException("Unexpected JSON event: " + event);
         }
     }
}
zeppelin
  • 8,947
  • 2
  • 24
  • 30
  • I declared a HashMap and put the row values in HashMap but If I want to get a value in a HashMap it return an empty List []. this is happening due to row.clear() statement. how to tackle this problem? –  Apr 11 '17 at 09:25
  • @AAKM Just re-create the row then, instead of clearing it: `row=new ArrayList<>()` (in the END_ARRAY block). And you better make sure that you have enough memory to store 100 million records in a giant HashMap. – zeppelin Apr 11 '17 at 09:33
  • replace row.clear() line and then execute it that take a lot of time. I used hashmap to search data more efficiently so there is any other means to store a huge data and search efficient by used minimum memory. –  Apr 11 '17 at 10:04
  • @AAKM - it all depends on what you want to do with your data, HashMap is ok, as long you have enough memory for it. – zeppelin Apr 11 '17 at 10:09
1

You can use JsonSurfer to extract all inner JSON array by a JsonPath: $[*]

    JsonSurfer surfer = JsonSurferJackson.INSTANCE;
    surfer.configBuilder().bind("$[*]", new JsonPathListener() {
        @Override
        public void onValue(Object value, ParsingContext context) {
            System.out.println(value);
        }
    }).buildAndSurf(json);

It won't load entire Json into memory. JSON array will be processed one by one.

Leo Wang
  • 331
  • 2
  • 8
  • Can this Surfer support Java8? I am getting error `ANTLR Tool version 4.7.1 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.7.1 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.82021-06-23 21:42:46.673 INFO 23946 --- [ main] ConditionEvaluationReportLoggingListener :` – Prakash Raj Jun 24 '21 at 02:47