3

I am trying to read several big files(over 100MB). And by so far it always cracks down in the middle with the OutofMemory Error. Is there any solutions to it?

           FileInputStream fstream = new FileInputStream(f);
          // Get the object of DataInputStream
        DataInputStream dain = new DataInputStream(fstream);
        //  BufferedReader br = new BufferedReader(new InputStreamReader(in));

        BufferedReader in = new BufferedReader(new InputStreamReader(dain));
        String text = in.readLine();
        while(text != null) {
            stat(text);
            text = in.readLine();
        }

The Exception is like this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:234)
at java.io.BufferedReader.readLine(BufferedReader.java:349)
at java.io.BufferedReader.readLine(BufferedReader.java:382)

Here is what Stat does:

    public void stat(String text) {
    String postTypeId = this.getXmlValue(text, "PostTypeId");
    String viewCountStr = this.getXmlValue(text, "ViewCount");
    String answerCountStr = this.getXmlValue(text, "AnswerCount");
    String userId = this.getXmlValue(text, "OwnerUserId");
    String postId = this.getXmlValue(text, "Id");
    String parentId = this.getXmlValue(text, "ParentId");
    String backUpId = this.getXmlValue(text, "LastEditorUserId");
    //Add post rel
    if(parentId==null) {
        if(!postTable.containsKey(postId)) 
            postTable.put(postId, new PostRel());
    } else {
        try{
        postTable.get(parentId).addAnswer(postId);
        }catch(Exception exp) {
        }
    }
              generalCount(postTypeId,viewCountStr,answerCountStr,userId,postId,parentId,backUpId);

}

And In generalCount, I tried to insert another table:

            if(userTable.containsKey(userId)) {
        userTable.get(userId).addPost(postId);
        if(parentId!=null)
            userTable.get(userId).addAnswer(parentId);
    } else{
        UserPostInfo newInfo = new UserPostInfo();
        newInfo.addPost(postId);
        if(parentId!=null)
            newInfo.addAnswer(parentId);
        userTable.put(userId, newInfo);
faz
  • 313
  • 5
  • 12
  • 4
    Is it possible that the file contains huge lines? That's *likely* if the file is not actually a text file. Also: are you holding onto any data inside the `stat` method? – Joachim Sauer Nov 03 '11 at 14:47
  • 2
    What does the `stat()` method do? – Matt Ball Nov 03 '11 at 14:51
  • Please don't use DataInputStream to read text. Unfortunately examples like this get copied again and again so can you can remove it from your example. http://vanillajava.blogspot.co.uk/2012/08/java-memes-which-refuse-to-die.html – Peter Lawrey Jan 31 '13 at 00:08

1 Answers1

13
  1. Give the JVM more memory to work with
  2. Use less memory while reading the files (can you work with streaming data instead?)
  3. Work with memory-mapped files
Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • Thanks a lot for your reply, could you be more specific with the streaming data option? I am not quite familiar with it. The file I read is the data file provided by stackoverflow, which is in XML form, and I think they should be all right because it is likely they are generated automatically from Stackoverflow's database. I am not holding data in the stat method, the stat method is grabbing some information from the line, and storing them into a hashtable, which I think would not waste too much memory. – faz Nov 03 '11 at 15:05
  • 1
    @faz the hashtable is almost certainly the culprit. I've fallen into this trap myself before. Please edit your question to show - at least at a high level - what `stat()` does. – Matt Ball Nov 03 '11 at 15:08
  • I have added the details of stat, is it because the hashtable is taking too much memory? Especially when I use the arraylist as a value type? – faz Nov 03 '11 at 15:22
  • If I had to guess: yes, it certainly looks like the table is simply growing too large. Try suggestion #1. Also, I recommend using Google Guava's [Multimap](http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html) instead of rolling your own. – Matt Ball Nov 03 '11 at 15:28
  • The first one is not working, seems 1024m is still not enough. I will try the Multimap suggestion, thanks! – faz Nov 03 '11 at 15:43
  • Multimap won't make a memory difference. Just how many elements are you trying to store in the map? [Turn on `XX:-HeapDumpOnOutOfMemoryError`](http://stackoverflow.com/questions/5554341/set-a-jvm-to-dump-heap-when-outofmemoryerror-is-thrown) and check out the resultant heap dump to see what's actually eating your heap space. – Matt Ball Nov 03 '11 at 16:14
  • I added the XX:-HeapDumpOnOutofMemoryError in eclipse, and this file is over 200M big. And I could not read it. Is it a normal case or that I used it in the wrong way? – faz Nov 04 '11 at 17:53
  • What program did you try to use to open the heap dump? You should use [VisualVM](http://download.oracle.com/javase/7/docs/technotes/guides/visualvm/index.html) or [MAT](http://www.eclipse.org/mat/) (if you're using Eclipse). – Matt Ball Nov 04 '11 at 19:28
  • I just located the place. It is that the Hashtable is growing too fast, I even modified the value from object to a String, but it does not change too much. Is there any way to fix this? – faz Nov 04 '11 at 20:22
  • Pretty much what I said in my answer. Either use less memory, or give the JVM more memory. How many entries did the table contain when the JVM OOME'd? How much heap space did the JVM have (it sounded like 256 mb)? – Matt Ball Nov 04 '11 at 20:24
  • I think it is about 160,000 entries at that time, and I set the JVM to 1024M using -Xmx1024m, but somehow it cracks down at about 300M. – faz Nov 04 '11 at 20:37