0

I was just recently asked an interview question that held to deal with reading from a CSV file and summing up entries in certain cells. When asked to optimize it, I couldn't answer how to deal with the case of running out of memory if we were given a CSV of size say 100 gigs.

In Java, how exactly does reading from a file work? How do we know when something is too big? How do we deal with that? I was told that you could pass in the intermediate reader object instead of trying to read the entire thing?

Jeff Gong
  • 1,713
  • 3
  • 15
  • 19
  • 3
    Process one row at a time. – user207421 Oct 06 '15 at 00:23
  • You do something like [this](http://stackoverflow.com/a/309718/940217), except instead of appending to a StringBuilder, do the summation calculation on the spot. Trying to store the whole input file in memory is what would cause trouble. – Kyle Falconer Oct 06 '15 at 00:23

3 Answers3

2

The interviewer gave you a hint - BufferedReader. It is an efficient choice for reading a large file line by line.

Small example:

String line;
BufferedReader br = new BufferedReader("c:/test.txt");
while ((line= br.readLine()) != null) {
   //do processing
} 
br.close();

Here is the documentation

sam
  • 2,033
  • 2
  • 10
  • 13
0

There are several ways to read from a file in Java, some of them involve keeping all of the files lines (or data) in memory as you "read" the data delimited by something like a newline character (reading line by line for example).

For large files you want to process smaller bits at a time using the Scannerclass (or something like it to read specific bytes at a time).

Sample code:

FileInputStream inputStream = new FileInputStream(path);
Scanner sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
   String line = sc.nextLine();
   // System.out.println(line);
}
shafeen
  • 2,431
  • 18
  • 23
-1

You can use RandomAccessFile to read the file. It may not be the best solution though.

Yibin Lin
  • 681
  • 1
  • 6
  • 20