1

I have written a simple Java program that reads a million rows from the Database and writes them to a File.

The max memory that this program can use is 512M.

I frequently notice that this program runs Out Of Memory for more than 500K rows.

Since the program is a very simple program it is easy to find out that this doesn't have a memory leak. the way the program works is that it fetches a thousand rows from the Database, writes them to a file using Streams and then goes and fetches the next thousand rows. The size of each row varies but none of the rows is huge. On taking a dump while the program is running the older string are easily seen on the heap. These String in heap are unreachable which means they are waiting to get Garbage collected. I also believe that the GC doesn't necessarily run during the execution of this program which leaves String's in the heap longer than they should.

I think the solution would be to use long Char Arrays(or Stringbuffer) instead of using String objects to store the lines that are returned by the DB. The assumption is that I can overwrite the contents of a Char Array which means the same Char Array can be used across multiple iterations without having to allocate new Space each time.

Pseudocode :

  1. Create an Array of Arrays using new char[1000][1000];
  2. Fill the thousand rows from DB to the Array.
  3. Write Array to File.
  4. Use the same Array for next thousand rows

If the above pseudocode fixes my problem then in reality the Immutable nature of the String class hurts the Java programmer as there is no direct way to claim the space used up by a String even though the String is no longer in use.

Are there any better alternatives to this problem ?

P.S : I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. Also I don't use Substring in my code.

Geek
  • 23,089
  • 20
  • 71
  • 85
  • 2
    You should post your code first. I suspect you have a leak, meaning that you somehow keep references to the strings you already dealt with (written to the database). Java's GC will make sure it disposes of objects (Strings included) if you no longer have references to them. Your Out Of Memory problem comes from somewhere else. – Tudor Vintilescu Oct 16 '12 at 08:27
  • It is never easy to find out whether a program has a memory leak just by static analysis and your `OutOfMemoryError` **proves that you indeed have a memory leak**. Without your code, however, there will be no useful advice beyond that. – Marko Topolnik Oct 16 '12 at 08:29
  • No, I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. – Geek Oct 16 '12 at 08:32
  • 1
    Strings are empty shells on their own. The real question is what is going on with their internal char arrays. These are shared among string instances so it is quite possible that your live string instances hold on to old char arrays. This can happen if you save strincgs that came about from `substring`, `trim` or other methods called upon your input strings. – Marko Topolnik Oct 16 '12 at 08:34
  • 1
    I think the hard truth is that no one will be satisfied with the mere assertion that you have ruled out memory leaks. We want to see the code, or some other code that reproduces the same problem. – johusman Oct 16 '12 at 08:39
  • @johusman : I get your point mate. Let me paste my code, will have to change some stuff befor I can post it. It is much easier to believe that there is a memory leak :-) In reality the GC runs sparingly in large applications some times once or twice a day. Most people don't know it ;-) – Geek Oct 16 '12 at 08:41
  • @Geek However infrequently the GC runs, you can rest assured that this **cannot possibly be the cause of an `OutOfMemoryError`**. – Marko Topolnik Oct 16 '12 at 08:51
  • @Geek - I would expect minor garbage collections to be running regularly - not once or twice a day – Brian Agnew Oct 16 '12 at 08:51
  • @Geek post the code already, damned suspense is annoying me! – Thihara Oct 17 '12 at 05:55

3 Answers3

2

Immutability of the class String has absolutely nothing to do with OutOfMemoryError. Immutability means that it cannot ever change, only that.

If you run out of memory, it is simply because the garbage collector was unable to find any garbage to collect.

In practice, it is likely that you are holding references to way too many Strings in memory (for instance, do you have any kind of collection holding strings, such as List, Set, Map?). You must destroy these references to allow the garbage collector to do its job and free up some memory.

Bruno Reis
  • 37,201
  • 11
  • 119
  • 156
1

The simple answer to this question is 'no'. I suspect you're hanging onto references longer than you think.

Are you closing those streams properly ? Are you intern()ing those strings. That would result in a permanent copy being made of the string if it doesn't exist already, and taking up permgen space (which isn't collected). Are you taking substring() of a larger string ? Strings make use of the flyweight pattern and will share a character array if created using substring(). See here for more details.

You suggest that garbage collection isn't running. The option -verbose:gc will log the garbage collections and you can see immediately what's going on.

Community
  • 1
  • 1
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
1

The only thing about strings which can cause an OutOfMemoryError is if you retain small sections of a much larger string. If you are doing this it should be obvious from a heap dump.

When you take a heap dump I suggest you only look at live objects, in which case any retained objects you don't need is most likely to be a bug in your code.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. – Geek Oct 16 '12 at 08:33
  • I don't use Substring anywhere :-) – Geek Oct 16 '12 at 08:33
  • If you have objects which are waiting to be cleaned up they will be collected on the next GC and won't trigger an OutOfMemroyError. If that is your concern you should only take a dump of live objects (after a Full GC) – Peter Lawrey Oct 16 '12 at 08:36
  • It is difficult to believe that objects on the heap waiting to be garbage collected wouldn't cause a Out Of Memory. Why do you say that ? – Geek Oct 16 '12 at 08:38
  • 2
    A Full GC is *always* run before an OutOfMemoryError is triggered for the heap. – Peter Lawrey Oct 16 '12 at 08:46
  • @Geek, that is simply how the JVM works: it only ever throws OOM if a thread tries to allocate some memory from the heap and there's no memory available even after performing a Full GC, which, also by definition, retrieve all the memory that can possibly be freed up. – Bruno Reis Oct 16 '12 at 08:46
  • Bruno, Peter : This answer was helpful. I am refactoring my code and will paste it soon. – Geek Oct 16 '12 at 08:48