2

I'm trying to implement multi-threading for some task parallelism in a program i am writing. The program is utilizes Spring framework and runs on Pivotal Cloud Foundry. It was crashing occasionally so i went in and viewed the logs and performance metrics; which is when i discovered it has a memory leak. after some testing i narrowed down the culprit to my threading implementation. My understanding of the GC in the JVM is that it will not dispose of a thread that is not dead, nor will it dispose of any object that is still being referenced by another object or later line of executable code. I'm not holding any reference to the thread at all however, and if i do it claims to put itself into a dead state once it finishes running so I dont know what is causing the leak.

I've written a clean PoC to demonstrate the leak. it uses a rest controller so i can control the number of threads, a runnable class because my real program requires parameters, and a string to take up arbitrary space in memory that would be held by other fields in the real program (makes the leak more apparent).

package com.example;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class LeakController {

    @RequestMapping("/Run")
    public String DoWork(@RequestParam("Amount") int amount, @RequestParam("Args") String args)
    {
        for(int i = 0; i < amount; i++)
            new Thread(new MyRunnable(args)).start();
        return "Workin' on it";
    }

    public class MyRunnable implements Runnable{
        String args;
        public MyRunnable(String args){ this.args = args; }
        public void run()
        {
            int timeToSleep = Integer.valueOf(args);
            String spaceWaster = "";
            for (int i = 0; i < 10000; i ++)
                spaceWaster += "W";
            System.out.println(spaceWaster);
            try {Thread.sleep(timeToSleep);} catch (InterruptedException e) {e.printStackTrace();}
            System.out.println("Done");
        }
    }
}

Can anyone explain why this program leaks memory?

Edit: I've gotten a few responses about String assignment vs string building and the string pool so i changed my code to the following

        int[] spaceWaster = new int[10000];
        for (int i = 0; i < 10000; i ++)
            spaceWaster[i] = 512;
        System.out.println(spaceWaster[1]);

and it still leaks.

Edit: In acquiring some real numbers to respond to Voo with i noticed something interesting. calling new threads starts eating memory up but only to a point. after permanently growing about 60mb the new integer based program stops growing further regardless of how hard it is pushed. Does this have something to do with the way the spring framework allocates memory?

I also think there is merit in going back to the String example as it more closely relates the my real use case; which is to do regex operations on incoming JSON, several hundred such JSON a second. With that in mind i have changed the code to:

@RestController
public class LeakController {

    public static String characters[] = {
            "1","2","3","4","5","6","7","8","9","0",
            "A","B","C","D","E","F","G","H","I","J","K","L","M",
            "N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};
    public Random rng = new Random();

    @RequestMapping("/Run")
    public String GenerateAndSend(@RequestParam("Amount") int amount)
    {
        for(int i = 0; i < amount; i++)
        {
            StringBuilder sb = new StringBuilder(100);
            for(int j = 0; j< 100; j++)
                sb.append(characters[rng.nextInt(36)]);
            new Thread(new MyRunnable(sb.toString())).start();
            System.out.println("Thread " + i + " created");
        }
        System.out.println("Done making threads");
        return "Workin' on it";
    }

    public class MyRunnable implements Runnable{
        String args;
        public MyRunnable(String args){ this.args = args; }
        public void run()
        {
            System.out.println(args);
            args = args.replaceAll("\\d+", "\\[Number was here\\]");
            System.out.println(args);
        }
    }
}

This new application exhibits similar behavior as the integer example in that it grows about 50mb permanently (after 2000 threads), and tapers off from there until i cant notice any memory growth with each new batch of 1000 threads (about 85mb past original deploy memory).

if i change it to remove stringbuilder:

String temp = "";
for(int j = 0; j< 100; j++)
    temp += characters[rng.nextInt(36)];
new Thread(new MyRunnable(temp)).start();

it leaks indefinitely; i assume that once all 36^100 strings have been generated once that it will stop.

Combining those findings i guess my real issue could be both an issue with the string pool and an issue with how spring allocates memory. What i am still failing to understand is that in my real application if i make a runnable and call run() on the main thread the memory doesnt seem to spike, but if i make a new thread and give it the runnable then the memory jumps. Heres what my runnable looks like currently in the application i am building:

public class MyRunnable implements Runnable{
    String json;
    public MyRunnable(String json){
        this.json = new String(json);
    }
    public void run()
    {
        DocumentClient documentClient = new DocumentClient (END_POINT,
                MASTER_KEY, ConnectionPolicy.GetDefault(),
                ConsistencyLevel.Session);
        System.out.println("JSON : " + json);
        Document myDocument = new Document(json);
        System.out.println(new DateTime().toString(DateTimeFormat.forPattern("MM-dd-yyyy>HH:mm:ss.SSS"))+">"+"Created JSON Document Locally");
        // Create a new document
        try {
            //collectioncache is a variable in the parent restcontroller class that this class is declared inside of
            System.out.println("CollectionExists:" + collectionCache != null);
            System.out.println("CollectionLink:" + collectionCache.getSelfLink());
            System.out.println(new DateTime().toString(DateTimeFormat.forPattern("MM-dd-yyyy>HH:mm:ss.SSS"))+">"+"Creating Document on DocDB");
            documentClient.createDocument(collectionCache.getSelfLink(), myDocument, null, false);
            System.out.println(new DateTime().toString(DateTimeFormat.forPattern("MM-dd-yyyy>HH:mm:ss.SSS"))+">"+"Document Creation Successful");
            System.out.flush();
            currentThreads.decrementAndGet();
        } catch (DocumentClientException e) {
            System.out.println("Failed to Upload Document");
            e.printStackTrace();
        }
    }
}

Any ideas where my real leak is? Is there somewhere i need a string builder? does string just do memory funny and i need to give it a higher ceiling to stretch out into then it will be fine?

Edit: i did some bench-marking so i could actually graph the behavior in order to better understand what the GC is doing

00000 Threads - 457 MB
01000 Threads - 535 MB
02000 Threads - 545 MB
03000 Threads - 549 MB
04000 Threads - 551 MB
05000 Threads - 555 MB
2 hours later - 595 MB
06000 Threads - 598 MB
07000 Threads - 600 MB
08000 Threads - 602 MB

It seems asymptotic but what was most interesting to me is that while i was out attending meetings and eating lunch it decided to grow 40mb on its own. i checked with my team and nobody used the app during that time. Not sure what to make of that either

kure
  • 89
  • 5
  • take a look at this post http://stackoverflow.com/questions/65668/why-to-use-stringbuffer-in-java-instead-of-the-string-concatenation-operator and also http://stackoverflow.com/questions/18406703/when-will-a-string-be-garbage-collected-in-java – JavaHopper Aug 03 '16 at 21:24
  • Clearly the string vs strong builder issue has nothing to do with whether you get a memory leak or not. How do you know you're having a leak to begin with? If the method is called too often before previous iterations finish you'll run out of memory. on the other hand if you still have free memory there's no reason to start a GC collection even if some objects would be collectable. This doesn't look like there's a memory leak anywhere. – Voo Aug 04 '16 at 10:57
  • @Voo If i run the application PCF reports about 400mb memory used. if i tell it to spin up a couple thousand threads the memory usage goes up to 450mb. If i check on it a few hours later its still at 450mb – kure Aug 04 '16 at 13:01
  • That's not necessarily a memory leak, that's just the GC deciding that it doesn't collect any memory if there's no memory pressure. – Voo Aug 04 '16 at 13:52
  • @Voo interesting, so the GC is lazy in java? I come from .NET land where i've never seen this behavior. do you have any links that detail Java's GC's behavior that you think do a good job of explaining it, or should i just dig into the oracle stuff? – kure Aug 04 '16 at 14:36
  • @Kure The .NET GC doesn't give any more guarantees than the HotSpot GCs do. Any modern GC that isn't ref counted is "lazy" to some degree. The idea is simple: Garbage collection costs time, so if you have enough free memory there's no reason to do one. you also have to be careful what you're measuring: HotSpot won't give memory back to the OS so the memory allocated to the process will never go down (the CLR on the other hand does). you can use eg visualvm to see whether you have an actual memory leak by forcing a full GC. – Voo Aug 04 '16 at 14:43
  • What type of OOM error are you getting? Can you link the stacktrace you receive? Or are you not running out of memory (just yet) but you're curious? – John Vint Aug 04 '16 at 18:07
  • @JohnVint All that PCF gives me is this: Thu Aug 04 2016 16:42:24 GMT-0400 (Eastern Daylight Time) [API] OUT App instance exited with guid 46e6d13c-5b18-44d0-b183-11dda9ad0c86 payload: {"cc_partition"=>"default", "droplet"=>"46e6d13c-5b18-44d0-b183-11dda9ad0c86", "version"=>"c4fbc295-f300-40e7-8708-eb55e170b561", "instance"=>"8127fcd1675b49b28951d30688b1e51d", "index"=>0, "reason"=>"CRASHED", "exit_status"=>255, "exit_description"=>"out of memory", "crash_timestamp"=>1470343344} Reading through the logs i only see debug output statements, and network errors right prior to crash, no OOM – kure Aug 04 '16 at 20:52

2 Answers2

0

it is becasue you keep adding String. Java doesn't GC String pool automatically

Java String Pool

String spaceWaster = "";
            for (int i = 0; i < 10000; i ++)
                spaceWaster += "W";

use StringBuilder instead

CSK
  • 352
  • 1
  • 6
  • once the loop is over though it should finish the method, be done with spaceWaster, and dispose of the field, which it does not. StringBuilder will not make make a difference, i could replace spaceWaster with "int[] spacewaster2 = new int[1000000]" and the leak remains – kure Aug 03 '16 at 22:54
  • The only strings interned in Java are literals (the "W" and the empty one in this case) or strings you explicitly call intern on. Everything else isn't for obvious reasons. – Voo Aug 05 '16 at 06:07
-1

use stringbuilder was correct

don't think you need 2000 threads.

A better design could be A Queue for tasks(string/docs) and a thread pool to process the string/docs.

user3644708
  • 2,466
  • 2
  • 16
  • 32
  • i agree that threadpooling would be better than using an AtomicInteger to keep track of running thread count but i have no experience with the pool in java and am currently doing more of a PoC than working at production code. also the program is actually reading from a queue and spinning up threads to handle the queue. the whole reason that i'm threading at all is because Azure Document DB (a NoSQL offering) takes an unacceptable amount of time to add a new record but scales well to multiple calls at once. – kure Aug 04 '16 at 17:45
  • ok, I see. depends on your string, it probably went to `Permanent Generation` directly, but not `eden space`. you need to tuning jvm parameters – user3644708 Aug 04 '16 at 18:53
  • No dynamically allocated strings don't go to the permanent generation, there's absolutely no reason to tune any parameters there. Hell not even interned strings end up there any more for a while. – Voo Aug 05 '16 at 06:04