1

I am using StringBuilder, reading each tweet of a file and writing it after filtering it to another file. I am also flushing my StringBuilder at the end of each loop. I am on a 8GB RAM mac retina mid 2012.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuffer.append(StringBuffer.java:322)
    at java.io.BufferedReader.readLine(BufferedReader.java:363)
    at java.io.BufferedReader.readLine(BufferedReader.java:382)
    at Parser.main(Parser.java:52)
  • 2
    Post you code. It really doesn't look like you're flushing your `StringBuffer`. For 5 GB you may need temporarily maybe three times as much: Your buffer at nearly 5 GB can be resized to 10 GB, so it makes 15 GB (assuming growth factor of 2). – maaartinus Apr 26 '14 at 11:37
  • Check this thread to allocate more memory http://stackoverflow.com/questions/2610194/how-can-i-give-eclipse-more-memory-than-512m – fasadat Apr 26 '14 at 11:38
  • The code has been posted. – user3575840 Apr 26 '14 at 11:41
  • You're storing things inside a LinkedHashSet in memory, without ever removing anything from it. That's probably where the memory problem comes from. – JB Nizet Apr 26 '14 at 12:00
  • Okay, what is your suggestion then? I don't think that's the problem, I am trying to use a basic file reader code from the docs and even that doesn't seem to work with the 5gb file. – user3575840 Apr 26 '14 at 12:02
  • What do you mean? You mean that just reading the lines from the file, and doing nothing with them, cause an OOME? – JB Nizet Apr 26 '14 at 12:08
  • I am doing a lot with them. Can't you see my code I posted. How about I use the SPLIT on my Mac (the unix command split) and do it? – user3575840 Apr 26 '14 at 12:21

2 Answers2

1

Sounds like you've got a memory leak. Hard to give you specific code advice without source code, but perhaps you have something holding a reference to your StringBuilder even after it's flushed? VisualVM is a good, free tool that can used to track down where this kind of problem is occurring at runtime. This blog post covers how to do that: http://rejeev.blogspot.com/2009/04/analyzing-memory-leak-in-java.html

kgilmer
  • 21
  • 2
  • I tried allocating 20G using -Xmx20g but still says Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit – user3575840 Apr 26 '14 at 11:42
0

From the structure of the program, we can conclude that the memory hog is either an object that gets bigger in each iteration of the loop (case 1), or an object that gets big within a single iteration (case 2).

The stacktrace indicates a failed memory allocation when the BufferedReader tried to resize its internal character buffer to accomodate a line of input. How long is this line at the time of failure? You can find out by running your program in a debugger, with an exception breakpoint on OutOfMemoryError, and inspect the variable holding the size of the array that could not be allocated. If it isn't huge, we can rule out case 2.

The most likely suspect for case 1 is the LinkedHashSet storing the tweet_f for all tweets in the output. Try estimating its size (a rough estimate can be obtained with ln.size() * (50 + 2 * average string length in chars), and ensure you have sufficient memory to hold it.

If that fails, I'd get out the heavy tools, i.e. take a heap dump, load it in an analysis tool such as VisualVM or a commercial profiler, ask that tool to identify the large objects and which references to these objects prevent their garbage collection.

meriton
  • 68,356
  • 14
  • 108
  • 175
  • 1. Is there a VisualVM for Mac? 2. I am not storing all tweets in LinkedHashMap, it only stores tweets that match the filter, as you can see, it is doing .add(blah) inside the if loop. Let alone this code piece, I am running the standard BufferedFile reader hello world example, reading a the same file and printing out the lines. Even that fails. – user3575840 Apr 26 '14 at 12:29
  • 1. Being part of the Oracle JDK, I'd expect [JVisualVM](http://docs.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html) to be available on Mac as well. 2. Did you follow my suggestion to check that the line lengths are reasonable? – meriton Apr 26 '14 at 14:28