2

I am profiling the garbage behavior of java.lang.String and it looks like each time you instantiate a string for the first time inside any class it always generate garbage. Would anyone know why?

public abstract class AbstractTest {

    protected static String SERGIO = "sergio";

    private String someValue;

    public void init() {
        this.someValue = new String(SERGIO);
    }
}

public class Test extends AbstractTest {

    private static String JULIA = "julia";

    private Runtime runtime = Runtime.getRuntime();
    private String anotherValue;
    private String yetAnother;

    private void gc() throws InterruptedException {
        System.gc();
        Thread.sleep(100);
    }

    private long usedMemory() {
        return runtime.maxMemory() - runtime.freeMemory();
    }

    public void test() throws Exception {
        gc();
        this.anotherValue = new String(SERGIO); // a bunch of garbage is created!
        long usedMemory = usedMemory();
        gc();
        long usedMemoryAfterGC = usedMemory();
        System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
        gc();
        this.yetAnother = new String(JULIA); // no more garbage
        usedMemory = usedMemory();
        gc();
        usedMemoryAfterGC = usedMemory();
        System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
    }

    public static void main(String[] args) throws Exception {
        Test t = new Test();
        t.test();
    }

Output:

Collected: 704336
Collected: 0

That 's fine. First time it creates garbage, then subsequent instantiations produce no garbage.

What is weird is when you force a string creation in the superclass, it still creates garbage in the subclass the first time you instantiate a String there:

public void test() throws Exception {
    gc();
    init(); // creates a String in the superclass
    gc();
    this.yetAnother = new String(JULIA);
    long usedMemory = usedMemory();
    gc();
    long usedMemoryAfterGC = usedMemory();
    System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
}

Output:

Collected: 348648

Any idea why?

(By the way I am running this on MAC and JDK 1.6.0_37)

EDIT1: I changed the code slightly to make it clear that string internalization is not the culprit here, at least it does not look like it is.

EDIT2: If you change String to Object throughout the code, you get the same garbage so I guess it has to do with how object allocation through new happens in Java. The first time you allocate an object in a class you get the garbage. The second time you don't. Weird it is per class.

EDIT3: I wrote a blog article where I talk about how to force a GC to profile your applications for garbage creation, like I am doing in the code above.

Kara
  • 6,115
  • 16
  • 50
  • 57
TraderJoeChicago
  • 6,205
  • 8
  • 50
  • 54
  • 2
    It might be worth noting that calling System.gc(); does not guarantee that the garbage collector will run immediately. Its kind of like a "hey, when you get a chance, it would be nice if you'd run" – user489041 Nov 21 '12 at 21:06
  • Don't understand what you mean with "garbage" - what did you expect? – Andy Nov 21 '12 at 21:06
  • garbage = de-referenced objects in the heap that will be eventually collected by the gc. If you noticed, I am not the one producing the garbage but the JVM is doing that under the hood. The JDK classes often do that, but this time it looks kind of weird. – TraderJoeChicago Nov 21 '12 at 21:11
  • @user489041 He is sleeping after the System.gc() so there is a good chance that the GC will run. And it looks like it is because stuff is getting collected... Could it be that the freeMemory() is just broken somehow? – chrisapotek Nov 21 '12 at 21:13
  • BTW `new String("julia")` can and should be replaced with `"julia"`. – Tomasz Nurkiewicz Nov 21 '12 at 21:15
  • 2
    @TomaszNurkiewicz:That would cause it to be pooled.So it matters regarding performance.But I don't think that this is the issue in this question – Cratylus Nov 21 '12 at 21:17

2 Answers2

3

A literal String inside a class will be "interned" on first reference (if not before). Interning will generally involve discarding the original version of the String and using the interned version, and a few more objects may be created and discarded in the process.

(And, of course, there's really no reliable way to detect how much garbage is being generated in any single operation, without special internal hooks, so your measurements are suspect at best.)

Hot Licks
  • 47,103
  • 17
  • 93
  • 151
  • I don't think that's the issue here. "interned" means added to the [internal pool](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#intern()). I added `String s1 = "sergio"; String s2 = "julia";` before the instantiation and it had no effect in the output. The strings I am creating with _new_ I am not internalizing on purpose because I want a new object in the heap. – TraderJoeChicago Nov 21 '12 at 21:26
  • 1
    @Cratylus -- There most definitely is an intern. According to the JVM spec, every statically-defined (literal) string in a class must be interned before it can be used. And, in the case of a `static final` string "inherited" from the parent class, the value is treated as a literal in the child class -- the value in the parent class is not referenced after compilation. If you do a `javap` on the child class you will see those strings in the literal pool. – Hot Licks Nov 21 '12 at 22:35
  • @HotLicks I am not doing any internalization myself in the lines I am measuring as you can see in my code. However I don't know if _new String(...)_ is doing any internalization internally. The garbage created seems excessive for internalized strings. Are you also saying that the string pool is not global? What does an internalized string literal has to do with the class it is in? A string literal "foo" on class A is the same object of string literal "foo" on class B, coming from the same string pool. No? – TraderJoeChicago Nov 21 '12 at 22:58
  • It's not `new String` that does this, it's reference to a string literal such as "julia". The first time such a literal is referenced it must be interned. (Interning is, in a way, the process for creating a "global" pool of string constants from the individual string literals in each class. This is why the literal "foo" will have the same address wherever you reference it.) – Hot Licks Nov 21 '12 at 23:26
  • Well, as you can see in my (updated) code there is nothing being internalized in the lines I am measuring. Therefore it does not look like internalization or the string pool have anything to do with the problem. :( – TraderJoeChicago Nov 21 '12 at 23:45
  • `new String(SERGIO)` references the static final string "sergio" in AbstractTest. Since the string is static final, it gets copied into Test (and you can see it there, in the constant pool, if you `javap` Test). Before `new String` can be called, the constant pool entry (that's used as a parameter) must be referenced, and that causes "sergio" to be interned. – Hot Licks Nov 21 '12 at 23:50
  • (Hint: Make SERGIO *not* be `final, `. Then Test will reference AbstractTest rather than use it's own string literal.) – Hot Licks Nov 21 '12 at 23:52
  • Thanks for trying to help out, but I did the changes you suggested with no effect in the output. If you can, copy the code and run it in your machine to see the garbage. I am editing the OP to remove the final keyword. We still haven't been able to prove the problem is related to the string pool or internalization. :( – TraderJoeChicago Nov 22 '12 at 00:04
  • Well, there's no way that any of this could amount to 348648 bytes. Your measurements are in no way representative of the amount of heap being gone through during the interning operations. As we said right off the bat, you can't really measure heap usage the way you're attempting to. – Hot Licks Nov 22 '12 at 00:23
  • It looks the problem is related to object instantiation in general, and not to Strings in particular. See EDIT2 in the OP. Thx! – TraderJoeChicago Nov 22 '12 at 01:59
  • There's no "problem". GC doesn't occur until the triggering conditions "build up". System.gc() is only a "suggestion" that GC be performed, and it won't occur unless other parameters are met. In your example you've managed to set things up so GC occurs at that particular point in time (or, rather, so GC *completes* at that particular point in time -- 100 ms is likely not long enough for a single GC to complete). – Hot Licks Nov 22 '12 at 02:22
  • Whether GC should have happened or not is out of the question. **The fact is that GC is happening** and garbage is being collected. If GC was not happening I would see ZERO as collected garbage. So that has nothing to do with the original question. :) Of course this is not a problem or a bug. Just some weird detail that I would like to know how it works under the hood. – TraderJoeChicago Nov 22 '12 at 02:27
  • GC is happening. It possibly hasn't happened for a long time. So it collects a non-trivial amount of garbage. 350K isn't a lot, from the standpoint of what's needed to start up a JVM. – Hot Licks Nov 22 '12 at 03:16
1

After reading Peter's answer here it is clear that TLAB is the culprit. If you use the option -XX:-UseTLAB to disable TLAB the problem goes away. From what I understood from here, it looks like with TLAB a thread allocates a big chunk of memory initially to avoid race conditions later. I was able to prove that TLAB was the culprit by setting a bigger size for it with the -XX:TLABSize=64m and see this amount being allocated.

Community
  • 1
  • 1
TraderJoeChicago
  • 6,205
  • 8
  • 50
  • 54