1

I learned about the Java String Pool recently, and there's a few things that I don't quiet understand.

When using the assignment operator, a new String will be created in the String Pool if it doesn't exist there already.

String a = "foo"; // Creates a new string in the String Pool
String b = "foo"; // Refers to the already existing string in the String Pool

When using the String constructor, I understand that regardless of the String Pool's state, a new string will be created in the heap, outside of the String Pool.

String c = new String("foo"); // Creates a new string in the heap

I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.

String d = new String("bar"); // Creates a new string in the String Pool and in the heap

I didn't find any further information about this, but I would like to know if that's true.

If that is indeed true, then - why? Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.

Another thing that I would like to know is how the .intern() function of the String class works: Does it just return a pointer to the string in the String Pool?

And finally, in the following code:

String s = new String("Hello");
s = s.intern();

Will the garbage collector delete the string that is outside the String Pool from the heap?

2 Answers2

5

You wrote

String c = new String("foo"); // Creates a new string in the heap

I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.

That’s somewhat correct, but you have to read the code correctly. Your code contains two String instances. First, you have the string literal "foo" that evaluates to a String instance, the one that will be inserted into the pool. Then, you are creating a new String instance explicitly, using new String(…) calling the String(String) constructor. Since the explicitly created object can’t have the same identity as an object that existed prior to its creation, two String instances must exist.

Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.

Well it does so, because you told it so. In theory, this construction could get optimized, skipping the intermediate step that you can’t perceive anyway. But the first assumption for a program’s behavior should be that it does precisely what you have written.

You could ask why there’s a constructor that allows such a pointless operation. In fact, this has been asked before and this answer addresses this. In short, it’s mostly a historical design mistake, but this constructor has been used in practice for other technical reasons; some do not apply anymore. Still, it can’t be removed without breaking compatibility.

String s = new String("Hello");
s = s.intern();

Will the garbage collector delete the string that is outside the String Pool from the heap?

Since the intern() call will evaluate to the instance that had been created for "Hello" and is distinct from the instance created via new String(…), the latter will definitely be unreachable after the second assignment to s. Of course, this doesn’t say whether the garbage collector will reclaim the string’s memory only that it is allowed to do so. But keep in mind that the majority of the heap occupation will be the array that holds the character data, which will be shared between the two string instances (unless you use a very outdated JVM). This array will still be in use as long as either of the two strings is in use. Recent JVMs even have the String Deduplication feature that may cause other strings of the same contents in the JVM use this array (to allow collection of their formerly used array). So the lifetime of the array is entirely unpredictable.

Holger
  • 285,553
  • 42
  • 434
  • 765
2

Q: I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap. [] I didn't find any further information about this, but I would like to know if that's true.

It is NOT true. A string created with new is not placed in the string pool ... unless something explicitly calls intern() on it.

Q: Why does java create this duplicate string?

Because the JLS specifies that every new generates a new object. It would be counter-intuitive if it didn't (IMO).

The fact that it is nearly always a bad idea to use new String(String) is not a good reason to make new behave differently in this case. The real answer is that programmers should learn not to write that ... except in the extremely rare cases that that it is necessary to do that.


Q: Another thing that I would like to know is how the intern() function of the String class works: Does it just return a pointer to the string in the String Pool?

The intern method always returns a pointer to a string in the string pool. That string may or may not be the string you called intern() or.

There have been different ways that the string pool was implemented.

  • In the original scheme, interned strings were held in a special heap call the PermGen heap. In that scheme, if the string you were interning was not already in the pool, then a new string would be allocated in PermGen space, and the intern method would return that.

  • In the current scheme, interned strings are held in the normal heap, and the string pool is just a (private) data structure. When the string being interned a not in the pool, it is simply linked into the data structure. A new string does not need to be allocated.


Q: Will the garbage collector delete the string that is outside the String Pool from the heap?

The rule is the same for all Java objects, no matter how they were created, and irrespective of where (in which "space" or "heap" in the JVM) they reside.

If an object is not reachable from the running application, then it is eligible for deletion by the garbage collector.

That doesn't mean that an unreachable object will be be garbage collected in any particular run of the GC. (Or indeed ever ... in some circumstances.)

The above rule equally applies to the String objects that correspond to string literals. If it ever becomes possible that a literal can never be used again, then it may be garbage collected.

That doesn't normally happen. The JVM keeps a hidden references to each string literal object in a private data structure associated with the class that defined it. Since classes normally exists for the lifetime of the JVM, their string literal objects remain reachable. (Which makes sense ... since the application may need to use them.)

However, if a class is loaded using a dynamically created classloader, and that classloader becomes unreachable, then so will all of its classes. So it is actually possible for a string literal object to become unreachable. If it does, it may be garbage collected.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thank you for your answer, it was very helpful for my understanding. After some more reading I understood why using the constructor affects the pool. When I use `String s = new String("Hello")`, then first the literal "Hello" is inserted into the pool and only then is being used by the constructor that inserts the string again into the heap. That is what the article I read meant. Another question following your answer: What is the String constructor good for? I can't find any advantages it has over using string literals. – Patrick Nilexis Oct 28 '20 at 11:35
  • The `new String(String)` constructor is only useful in those exceedingly rare situations where you **need to** create a unique `String` object. (For instance, in artificial examples that are designed to show that string literals are interned. Or pointless quiz questions.) The other `String` constructors are more useful. – Stephen C Oct 28 '20 at 13:03
  • @StephenC those "exceedingly rare" are now part of the [jdk itself](https://bugs.openjdk.java.net/browse/JDK-8247605), as seen here [here](https://hg.openjdk.java.net/jdk/jdk/rev/20d92fe3ac52), in jdk-16. – Eugene Oct 28 '20 at 13:32
  • 1
    I mean ... exceedingly rare in normal application code. This is only being done in that context because the JLS says it must be done. – Stephen C Oct 28 '20 at 14:21
  • 2
    It is worth noting that the `+` concatenation operator was specified to create a new `String` way back in the Java 1.0 spec. I *imagine* that if they were starting again from a clean slate in 2020, they wouldn't specify Java `+` to do that. – Stephen C Oct 28 '20 at 14:40
  • 1
    I don’t think that too much real life code relies on the string concatenation to create new objects. So a specification change, saying that string concatenation evaluates to an object of unspecified identity, wouldn’t be too disrupting. There could be a phase where the specification says that it is unspecified, hence, application code should make no assumption, while the implementation behavior stays the same, before they start exploiting the new rule to gain performance or reduce memory consumption. But well, if we could have a clean start, it shouldn’t be an overloaded `+` operator… – Holger Oct 29 '20 at 08:35
  • @Holger Tagir questioned the same thing before making that change. He got zero answers, everywhere he poked. I am not sure anyone knows the reasons the specification was written in that manner. – Eugene Oct 29 '20 at 14:59
  • @StephenC you mentioned that the `intern()` function doesn't create a new string in the pool, but links the string from the heap into the pool. When I searched for information about the intern function, every site mentioned it actually makes a copy into the pool. Can you link me to a source that confirms that? – Patrick Nilexis Oct 30 '20 at 10:31
  • 2
    That information that you found probably refers to the old (PermGen) string pool implementations. However, no I can't find a reference ... unless OpenJDK source code counts. (In Java 11, the interning code is in .../src/hotspot/share/classfile/stringTable.cpp). It is a little more complicated than I thought because a string that doesn't match a previously interned string is dedup'd before it is added to the hash table. However no new string is allocated. – Stephen C Oct 30 '20 at 13:22
  • 2
    @PatrickNilexis the problem is, everyone can set up a web page. And it’s easier to retell the story read in another web page, then to research and develop an understanding, before publish an article. Just consider, how Java works. Objects are manipulated through references. Collections and Maps are storing references. `intern()` and the string literals evaluate to references. Since the purpose of the pool is to evaluate to references, it is, of course, a data structure maintaining references. The actual storage of the objects is irrelevant, even when `intern()` made a copy prior to Java 7. – Holger Nov 02 '20 at 09:51