0

Ok mates, here is my code. I've got a problem, because "records.csv" is a file which cointains moreless 20 millions line, each one made of 4 fields separated with a ','.

As you can understand from the code, i'd like to have 4 Arraylists, each of them with all the values of a different field. The method after a while stop working (i think because to 'add' an element to the list, java has a pointer that have to tread all the arraylist before).

I need to solve, but i don't know how.

Suggestions?

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;

    public class RecordReader {
    static ArrayList<String> id = new ArrayList <String> ();
    static ArrayList<String> field1 = new ArrayList <String> ();
    static ArrayList<String> field2 = new ArrayList <String> ();
    static ArrayList<String> field3 = new ArrayList <String> ();



    public static void Reader () {
        try {
        FileReader filein = new FileReader("Y:/datasets/records.csv");
        String token="";
        String flag = "id";
        int index=0, next;

        do {
            next = filein.read();

            if (next != -1) {

                if (next !=',' && next !='\n') 
                    token = token + next;

                else if (next == ','){
                    if (flag.compareTo("id")==0) {id.add (index, token); flag = "field1";}
                    else if (flag.compareTo("field1")==0) {field1.add (index, token); token=""; flag = "field2";}
                    else if (flag.compareTo("field2")==0) {field2.add (index, token); token=""; flag = "field3";}
                }

                else if (next == '\n') { 
                    if (flag.compareTo("field3")==0) {field3.add (index, token); token=""; flag = "id"; index++;} 
                }

                char nextc = (char) next; 
                System.out.print(nextc); 
                }
        } while (next!=-1);

        filein.close();
        }
        catch (IOException e) { System.out.println ("ERRORE, birichino!"); }
    }
}

I have to do it all in once, the file is 711000 bytes.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.nio.CharBuffer.wrap(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.PrintStream.write(Unknown Source) at java.io.PrintStream.print(Unknown Source) at RecordReader.Reader(RecordReader.java:42) at prova.main(prova.java:26)

ciurlaro
  • 742
  • 10
  • 22
  • Is there a stack trace you could post? Also, do you absolutely have to have all of the data in memory at once? Most likely you're running out of memory and crashing the program. How large is the file in bytes? – NAMS Apr 21 '16 at 15:40
  • you may want to wrap your file reader into buffered reader – Palcente Apr 21 '16 at 15:48
  • I have to do it all in once, the file is 711000 bytes. `Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.nio.CharBuffer.wrap(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.PrintStream.write(Unknown Source) at java.io.PrintStream.print(Unknown Source) at RecordReader.Reader(RecordReader.java:42) at prova.main(prova.java:26)` (I'll update the question) – ciurlaro Apr 21 '16 at 15:50
  • It seems as if your Stack Trace points to `System.out.print(nextc);`. Can you verify that? Is that line 42? If that's the case, uncomment that line (I think you added for debugging perposes?) and try again. 711kB should be read and saved very easily, but in console, there may be problems printing all that. – Tobias Brösamle Apr 21 '16 at 15:59
  • How you run that? You need to increase heap size. Checkout these: http://stackoverflow.com/questions/2294268/how-can-i-increase-the-jvm-memory – alpert Apr 21 '16 at 16:08
  • The stop working is not the only problem, it get REAAAALLY slow before it, i'd like to do it fast – ciurlaro Apr 21 '16 at 16:11
  • You don't have to increase heap size if you can write the code better to not use the whole heap to begin with. – NAMS Apr 21 '16 at 16:11
  • Okay, but i don't know what could i do to improve the code. – ciurlaro Apr 21 '16 at 16:12
  • I think the real question is - what are you REALLY trying to do? perhaps there's a better way to do it. What's your final goal? Are you, for example, going to write out 4 new text files with the contents of each array? Sort each one and get the middle value? etc. etc. Knowing what you're REALLY trying to do would help us help you. – Brian Pipa Apr 21 '16 at 16:41

2 Answers2

1

I have a couple of suggestions for you.

First, you don't need to have 4 separate ArrayLists, just one will do fine. Instead of using filein.read(), I would wrap your FileReader with a BufferedReader and use it to read the file line by line and add each line to a single ArrayList.

BufferedReader br = new BufferedReader(filein);
ArrayList<String> content = new ArrayList<String>();
String line = br.readLine();
while(line != null){
    //add lines to ArrayList
    content.add(line);
    line = br.readLine();
}

This will read the contents of the entire file into memory without the additional overhead of 3 extra ArrayLists.

Second, since your fields are separated by a , and (I'm assuming) always have the same number of fields, you can use the split() method to separate each line into an array of strings.

String[] record = content.get(index).split(",");
//record[0] = id
//record[1] = field1
//record[2] = field2
//record[3] = field3

Put the above into a loop and you can iterate over all of the file's contents. Since you know how the information is ordered, retrieving the information you want is trivial to do.

However, I will warn you that with a sufficiently large enough file (with multiple GB of data), eventually this approach will also fail.

NAMS
  • 983
  • 7
  • 17
0

Can you try running the application with -Xmx option as shown below

java -Xmx6g [javaclassfile]

I was able to resolve similar problem with this.

Manas
  • 86
  • 6