15

I have a large text file with 20 million lines of text. When I read the file using the following program, it works just fine, and in fact I can read much larger files with no memory problems.

public static void main(String[] args) throws IOException {
    File tempFile = new File("temp.dat");
    String tempLine = null;
    BufferedReader br = null;
    int lineCount = 0;
    try {
        br = new BufferedReader(new FileReader(tempFile));
        while ((tempLine = br.readLine()) != null) {
            lineCount += 1;
        }
    } catch (Exception e) {
        System.out.println("br error: " +e.getMessage());
    } finally {
        br.close();
        System.out.println(lineCount + " lines read from file");
    }
}

However if I need to append some records to this file before reading it, the BufferedReader consumes a huge amount of memory (I have just used Windows task manager to monitor this, not very scientific I know but it demonstrates the problem). The amended program is below, which is the same as the first one, except I am appending a single record to the file first.

public static void main(String[] args) throws IOException {
    File tempFile = new File("temp.dat");
    PrintWriter pw = null;
    try {
        pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
        pw.println(" ");
    } catch (Exception e) {
        System.out.println("pw error: " + e.getMessage());
    } finally {
        pw.close();
    }

    String tempLine = null;
    BufferedReader br = null;
    int lineCount = 0;
    try {
        br = new BufferedReader(new FileReader(tempFile));
        while ((tempLine = br.readLine()) != null) {
            lineCount += 1;
        }
    } catch (Exception e) {
        System.out.println("br error: " +e.getMessage());
    } finally {
        br.close();
        System.out.println(lineCount + " lines read from file");
    }
}

A screenshot of Windows task manager, where the large bump in the line shows the memory consumption when I run the second version of the program.

task manager screenshot

So I was able to read this file without running out of memory. But I have much larger files with more than 50 million records, which encounter an out of memory exception when I run this program against them? Can someone explain why the first version of the program works fine on files of any size, but the second program behaves so differently and ends in failure? I am running on Windows 7 with:

java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)

Roman C
  • 49,761
  • 33
  • 66
  • 176
Wee Shetland
  • 955
  • 8
  • 18
  • 1
    Is it the `BufferedReader` that takes all the memory? I'd rather suspect it'd be the `FileWriter` doing this. – david a. Aug 30 '12 at 17:39
  • 1
    Is there a reason for adding a `BufferedWriter` into the mix? Do you still get the same problem if you do `new PrintWriter(new FileWriter(...))`? – ᴇʟᴇvᴀтᴇ Aug 30 '12 at 17:51
  • 2
    (Nothing to do with the question, but I have to point out the you could get an NPE in the finally block. The way to deal with this is to use Java SE 7's try-with-resource, or with Java SE 6 use separate try's for the finally and catch and avoid the use of nulls.) – Tom Hawtin - tackline Aug 30 '12 at 17:54
  • 1
    Seems curious, since neither version of the code is actually doing anything. – Hot Licks Aug 30 '12 at 18:01
  • @TomHawtin-tackline -- All that's really necessary is to condition the close statements with `if (pw != null)`, etc. – Hot Licks Aug 30 '12 at 18:03
  • 2
    I've tested the second version on file about 1.3GB with more than 30 millions lines and it runs fine. Heap consumption about 60 MB. _Java_ _6_ / _Linux_ _X86_ – Oleg Kandaurov Aug 30 '12 at 18:08
  • @HotLicks You still end up with a situation where you are likely to make the same error. You are also left with the very strange exception handling – Tom Hawtin - tackline Aug 30 '12 at 18:11
  • 2
    Are we sure it's actually heap which is huge. Appending to the file may require all sorts of rearrangement of disc which will be cached in RAM. – Tom Hawtin - tackline Aug 30 '12 at 18:13
  • @TomHawtin-tackline, that's what I was thinking as well though still not sure how to go about proving it. – Cypher Aug 30 '12 at 18:15
  • @TomHawtin-tackline -- Exception handling requires thought. – Hot Licks Aug 30 '12 at 18:19
  • I suspect that the system is running out of disk while rewriting the large file, and Java heap just gets to be the bearer of bad tidings since it maxes out trying to raise the I/O exception. How much free disk is on the system, in comparison to the file being modified? – Hot Licks Aug 30 '12 at 18:24
  • (I just ran 10.1 million lines through the program with no problem.) – Hot Licks Aug 30 '12 at 18:26
  • @Hot Licks it's not a disk space problem, as I have more than 500g free on the C: drive. – Wee Shetland Aug 30 '12 at 18:35
  • @TomHawtin hmm, I think you might be on to something with it being a RAM issue rather than a heap issue. I will profile the program and see how much heap is actually being used. – Wee Shetland Aug 30 '12 at 18:36
  • @tony_h -- Keep in mind that a file over 2-something GB in size will cause overflows in 32-bit file size counters, creating unpredictable havoc. – Hot Licks Aug 30 '12 at 18:55
  • I have profiled the program, and the heap size never gets above 150MB, so its not a heap issue. The PrintWriter append section on its own runs in a millisecond and never has memory issues. The BufferedReader section on its own never has memory issues. When the 2 are run consecutively, the BufferedReader section consumes a massive amount of RAM. I still have no idea why. – Wee Shetland Aug 30 '12 at 19:34
  • You're sure you're showing us the WHOLE program, with NO changes? – Hot Licks Aug 30 '12 at 19:44
  • I don't see any reason to assume it's the BufferedReader, or indeed anything in that code. BuffedReader memory use is capped at 4096 chars unless you have amazingly long lines. Contrary to another suggestion, I don't see why appending to a disk file should require any disk rearrangement at all, let alone any startling memory use, and certainly Java doesn't do any of it, so it wouldn't cause an OOM. – user207421 Aug 30 '12 at 23:12
  • @Hot Licks yep, i'm running those 2 programs EXACTLY as I have shown you here. – Wee Shetland Aug 31 '12 at 08:59
  • Well, with a one-character change to the above code one can easily create an out of memory error. – Hot Licks Aug 31 '12 at 11:56
  • Question: Do you have any sort of disk "enhancement" installed on this box? Some sort concurrent backup tool, live encryption, mirroring, etc? – Hot Licks Aug 31 '12 at 11:57
  • @Hot Licks no I have nothing like that running. I have decided to use an alternative approach, as I can't seem to get a specific reason why the program operates the way it does. I am coming to the conclusion that this is not a Java problem, but an OS specific one with regards to how Windows 7 is disk reading and caching on this particular machine. Thanks to all who have pitched in with ideas and help. – Wee Shetland Aug 31 '12 at 18:11
  • Yep, I suspect that, for some reason, Windows believes it must keep a backup copy of the file. Precisely why is hard to guess -- it could be, eg, some sort of backup tool, an odd version of the file system you're running, etc. Likely something you've forgotten about. – Hot Licks Aug 31 '12 at 18:26
  • From the screenshot you can't even see for sure that the Java process is eating up the RAM. As previously suggested it could be an OS process. Could you use a profiler such as VisualVM ( http://visualvm.java.net/ ) to confirm that the RAM is being allocated to the heap, and post results? If it is the heap, then you could do a "heap dump". (If it's not even the java process, then something like "Process Hacker" - http://processhacker.sourceforge.net/ - would clearly confirm which process was hogging the RAM). – laher Sep 03 '12 at 00:23
  • Please post the stack trace from the exception or error you are getting. – Roland Illig Sep 08 '12 at 09:49
  • @tony_h : Have a look at this SE question : http://stackoverflow.com/questions/1062113/fastest-way-to-write-huge-data-in-text-file-java – Ravindra babu Mar 09 '16 at 22:33

6 Answers6

1

you can start a Java-VM with VM-Options

-XX:+HeapDumpOnOutOfMemoryError

this will write a heap dump to a file, which can be analysed for finding leak suspects

Use a '+' to add an option and a '-' to remove an option.

If you are using Eclipse the Java Memory Analyzer Plugin MAT to get Heap-Dumps from running VMs with some nice analyses for Leak Suspects etc.

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69
jethroo
  • 2,086
  • 2
  • 18
  • 30
0

Each time you execute the java following Java routine, you are creating a brand new object:

tempLine = br.readLine()

I believe each time you call readLine() it is probably creating a new String object which is left on the heap each time the re-assignment is called to assign the value to tempLine.

Therefore, since GC isn't constantly being called thousands of objects can be left on the heap within seconds.

Some people say its a bad idea to call System.gc() every 1000 lines or so but I would be curious if that fixes your issue. Also, you could run this command after each line to basically mark each object as garbage collectable:

tempLine=null;
djangofan
  • 28,471
  • 61
  • 196
  • 289
  • I don't think that's the problem. When I run the readonly version of the program, the BufferedReader works just fine with no memory problems at all. The problem only occurs when I precede the reading of the file with a section which appends a line to the file using a printwriter. – Wee Shetland Aug 31 '12 at 08:58
  • What is your line count on the exception? Also, if you use JDK 1.6.0_22 or higher, I believe you get a multithreaded garbage collector and I am curious what behavior you get with that? Also, doesn't BufferedWriter allow you to increase the buffer size? Alternative: try using InputStreamReader and FileInputStream to read and then store the data in a char, then just write that char using a FileOutputStream. – djangofan Aug 31 '12 at 15:09
0
     pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));

did you try not using a BufferedWriter? If your appending a few lines to the end maybe you don't need a buffer? If you do, consider using a byte array (collections or String builder). Finally did you try the same in java 1.6_32? Might be a bug in the new version of one of the Writers.

Can you print the free memory after before and after pw.close(); ?

System.out.println("before wr close :"  + Runtime.getRuntime().freeMemory());

and similar after close and after reader close

tgkprog
  • 4,493
  • 4
  • 41
  • 70
0

It could be because you may not be having linefeed/carriage return in your file at all. In this case, readLine() tries to create just one single string out of your file which is probably running out of mememory.

Java doc of readLine():

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

Chris
  • 5,584
  • 9
  • 40
  • 58
  • That's not the problem unfortunately, the files are all delineated properly, and I'm getting the correct line counts as I parse the files. – Wee Shetland Sep 11 '12 at 17:42
0

Have you tried:

A) creating a new File instance to use for the reading, but pointing to the same file. and B) reading an entirely different file in the second part.

I'm wondering if either, the File object is still somehow attached to the PrintWriter or if the OS is doing something funny with the file handles. Those tests should show you where to focus.

This doesn't look to be a problem with the code, and your logic for thinking it shouldn't break seems sound, so it's got to be some underlying functionality.

Link19
  • 586
  • 1
  • 18
  • 47
  • Thanks @Glen Lamb, I think your suggestions make a lot of sense. However I had already spent too much time on this issue and finally decided to do it another way which avoided this problem altogether. If I ever get time to return to it, I'll post any results I get. – Wee Shetland Sep 11 '12 at 17:40
-3

you'll need to start java with a bigger heap. Try -Xmx1024m as a parameter on the java command.

Basically your going to need more memory than the size of the file.

Dan
  • 1,030
  • 5
  • 12
  • 6
    Can you explain why I need a bigger heap for the 2nd program but not the 1st? The 1st version of the program works just fine, and uses a very small heap size. The BufferedReader processes the file 1 line at a time so it shouldn't need much memory at all? – Wee Shetland Aug 30 '12 at 17:38