0

I am writing a program that needs to read in very large files (about 150Mb of text). I am running into an out of memory error when I try to read in files that are larger than 50Mb. Here is an excerpt from my code.

if (returnVal == JFileChooser.APPROVE_OPTION) {
        file = fc.getSelectedFile();
        gui.setTitle("Fluent Helper - " + file.toString());
        try{
            scanner = new Scanner(new FileInputStream(file));
            gui.getStatusLabel().setText("Reading Faces...");
            while(scanner.hasNext()){
                count++;
                if(count<1000000){
                    System.gc();
                    count = 0;
                }
                readStr = scanner.nextLine()+ "\n";
                if(readStr.equals("#\n")){
                    isFaces = false;
                    gui.getStatusLabel().setText("Reading Cells...");
                }else if(isFaces){
                    faces.add(new Face(readStr));
                }else{
                    cells.add(new Cell(readStr));
                }
            }
        }catch (Exception e){
            e.printStackTrace();
        }finally{
            try{
                scanner.close();
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        System.out.println("flie selected");
    } else {
        System.out.println("file not selected");
    }

the small block that calls the garbage collector every arbitrary number of reads is something I added to solve the memory problem, but it doesn't work. Instead the program hangs and never gets to the cells portion of the file (which should happen in less than a second). Here is the block.

                    if(count<1000000){
                    System.gc();
                    count = 0;
                }

My guess is that maybe the Scanner's pointer is getting garbage collected or something. I really don't have any clue. Launching the program with a larger heap is not really an option for me. The program should be usable by people with out very much computer knowledge.

I would like a solution to get the file in with out a problem, be it a memory management one or fixing the scanner or a more efficient means of reading the file. Thanks everyone.

trincot
  • 317,000
  • 35
  • 244
  • 286
Michael
  • 103
  • 8

2 Answers2

1

The GC will be called automatically when required so calling it yourself will just slow down your application.

The problem is the amount of data you are retaining

                faces.add(new Face(readStr));
            }else{
                cells.add(new Cell(readStr));

These are exceeding the amount of memory you have as a maximum heap. Can you try setting -mx1g to see if this makes a difference?

BTW: Why are you adding a \n to the end of each line?

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • the constructors for the face and Cell objects expect the \n at the end of the string – Michael Jun 15 '12 at 13:35
  • unfortunately the -mx1g did not yield any results. I actually think the heap was already sitting around 2g – Michael Jun 15 '12 at 13:46
  • is the \n a problem? I realize that it isn't that efficient but I haven't changed it because the container object's constructor calls those constructors and passes it a string with \n still attached at the end. – Michael Jun 15 '12 at 13:48
  • I don't see why reading a 50 MB file would use more than 2 GB. Can you use visualvm to see how much memory is used? Splitting `"abc".split("\n")` and `"abc\n".split("\n")` gives the same result. i.e. `new String[] { "abc" }` – Peter Lawrey Jun 15 '12 at 13:52
  • The memory problem may be because the cell objects actually allocate an array of size 500 to store cells that are nearby as part of a search algorithm that is used later on. – Michael Jun 15 '12 at 14:01
  • 1
    Got rid of the array, worked like a charm. Much faster too. I didn't use it anymore anyway, I came up with a better solution. This program is getting to big I kind of forgot that array was even there :). Thanks for all the help. – Michael Jun 15 '12 at 14:03
1

Calling the garbage collection usually is not a good idea, you might want to take a look here why: Why is it bad practice to call System.gc()?

Have you tried to increase the maximum heap size, for instance with -Xmx:1g for 1 gigabyte?

Community
  • 1
  • 1
jayeff
  • 148
  • 9
  • I am using eclipse as an IDE, it appears to be allocating much more memory for the heap than 1gb or at least that's what a call to System.out.println("max memory: " + java.lang.Runtime.getRuntime().maxMemory()); has lead me to believe. – Michael Jun 15 '12 at 13:58
  • 1
    Not too sure, what maxMemory tells you in depth. I noticed you append a newline to the line you get from the scanner. Because strings are immutable in Java, this creates a new object. You could check first for if the line starts with `#` and only add the newline if you really call the Face/Cell constructors. – jayeff Jun 15 '12 at 14:07