-1

This is to read a file faster not write it. I have a 150MB file which has a JSON object inside it. I currently use the following code to read it:

String filename ="/tmp/fileToRead";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), Charset.forName("UTF-8")));
decompressedString = reader.readLine();
reader.close();
JSONObject obj = new JSONObject(decompressedString);
JSONArray profileData = obj.getJSONObject("profileData").getJSONArray("children");
....

It is a single line file and since it is JSON I can't split it ( or atleast I think so). Reading the file gives me a OutOfMemory Error or a TLE. The file takes more than 7 secs to be read and that results in the TLE since the execution of the whole code cannot go beyond 7 seconds. I get the OOM on decompressedString = reader.readLine();.

Is there a way I can reduce the memory used or the time it takes to be read completely?

Belphegor21
  • 454
  • 1
  • 5
  • 24
  • Check this ; http://stackoverflow.com/questions/1062113/fastest-way-to-write-huge-data-in-text-file-java?noredirect=1&lq=1 – Emil Hotkowski Apr 06 '17 at 11:12
  • This is a question to read a file not write it. How can it be a duplicate of writing in a file? – Belphegor21 Apr 06 '17 at 11:15
  • @Rjiuk regarding the duplicate, "read" (in `BuffereadReader`) is not the same as "write". – Olivier Grégoire Apr 06 '17 at 11:16
  • Do you need the entire JSON object in your app, or just a subset of it (e.g. some specific fields?) – Krešimir Nesek Apr 06 '17 at 11:17
  • @PrateekGupta What library are you using to read the JSON? Which library provides you your `JSONObject`? – Olivier Grégoire Apr 06 '17 at 11:18
  • 2
    do not convert file to string. Use a json library capable to load JSONObject from a file directly. – Alexei Kaigorodov Apr 06 '17 at 11:18
  • i need the value of a `"profileData"` key and then `"children"` in the JSON. But they are what takes the space in JSON. Other key are metadata, this is the main data. – Belphegor21 Apr 06 '17 at 11:19
  • I use the `org.json` library, i got this form here http://mvnrepository.com/artifact/org.json/json – Belphegor21 Apr 06 '17 at 11:21
  • In that case I'd agree with @AlexeiKaigorodov, use a library that supports reading JSON from an InputStream rather than just from the String to avoid reading (and loading) data twice. – Krešimir Nesek Apr 06 '17 at 11:21
  • Can you suggest one that doesn't lead me to change my code too much? – Belphegor21 Apr 06 '17 at 11:22
  • Don't use that library as, as @AlexeiKaigorodov said, you need a library able to handle a stream. Use Gson or JSON.simple or Jackson. – Olivier Grégoire Apr 06 '17 at 11:22
  • 1
    Also, if you ever want to read *data*, don't ever use `BufferedReader.readLine()` unless your 100% sure each line is is small enough (less than 4-8 kb). – Olivier Grégoire Apr 06 '17 at 11:28
  • OutOfMemory is not the same as TLE (Time Limit Exceeded). Your problem is not that reading the file is taking too long, your problem is that the data in the file is more than your allotted maximum Heap Size. To solve this you need to increase the maximum Heap Size. And what makes you believe that the whole code can not run for more than 7 seconds? – iavanish Apr 06 '17 at 11:29
  • @iavanish The whole code cannot run for more than 7 seconds because once it exceeds that the system kills it and produces a FATAL. These are company specs. I can't do anything about it. – Belphegor21 Apr 06 '17 at 11:32
  • @OlivierGrégoire , what should I use to read the file then? Because in this case the whole file is a single line which is 150MB in size. I am a beginner in java so sorry for the stupid questions if any. – Belphegor21 Apr 06 '17 at 11:34
  • Instead of BufferedReader, use the code given in the 4th solution on this link http://www.geeksforgeeks.org/fast-io-in-java-in-competitive-programming/ It speeds up I/O and allows to read one token (or character or integer or ... ) at a time. This will be faster than using any JSON library. – iavanish Apr 06 '17 at 11:37

1 Answers1

2

You have several problems at hand:

  1. You're preemptively parsing too much.

    The error you get happens already when you read the line since you said "I get the OOM on decompressedString = reader.readLine();".

    You should never try to read data line by line. BufferedReader.readLine() will block until you've read the character \r or \n or the sequence \r\n. When processing data of any length, you're never sure you'll get one of those characters. Also, you're never sure you'll get of those characters outside of the data itself. So your string may be too long or malformed. So don't ever pretend to know the format. BufferedReader.readLine() must be used when parsing, not when acquiring data.

  2. You're not using an appropriate library for your use-case

    Reading your JSON is important, yes, but you're reading too much at once. When creating your JSON, you might want to build it from a stream (one of InputStream, Reader or any nio's Channel/Buffer).

    Currently you're making your JSON from a String. A huge one. So I can safely assume you're going to require at one point twice the memory you need. One time in the String and one time in the finalized object.

    To reduce that, use an appropriate library to which you can pass one of the stream mentioned above. I mentioned in my comments the following: Gson, JSON.simple and Jackson.

  3. Your file may be too big anyways.

    If you get your data and you want to acquire only subset of it (here, you want everything under {"profileData":{"children": <DATA>}}). But you probably have way too much. How many elements exist at the same level as profileData? How many elements exist at the same level as children? Do you know? Probably way too much. All that is not under profileData.children is useless. What percentage of your total data is that? 50%? 90%? 99%?

    To solve this, you probably want one of two things: you want less data or you want to be able to focus your request.

    If you want less data, ask your data provider to give you less: only what you need. Why get more than that? It makes no sense. Tell him so and say "I want less".

    If you want focused data, use a library that allows you to both parse and reduce the amount of data. You might want to have a library that lets you say this: "parse this JSON and return only the processingData.children element". Unfortunately I know no library that does it. If others do, please add a comment or answer. Apparently, Gson is able to do so if you use the JsonReader yourself and selectively use skipValue().

Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137