69

I am confused about StringPool in Java. I came across this while reading the String chapter in Java. Please help me understand, in layman terms, what StringPool actually does.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Subhransu Mishra
  • 3,035
  • 11
  • 40
  • 47

4 Answers4

128

This prints true (even though we don't use equals method: correct way to compare strings)

    String s = "a" + "bc";
    String t = "ab" + "c";
    System.out.println(s == t);

When compiler optimizes your string literals, it sees that both s and t have same value and thus you need only one string object. It's safe because String is immutable in Java.
As result, both s and t point to the same object and some little memory saved.

Name 'string pool' comes from the idea that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.

Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
  • 4
    Java has wrapper types for primitive types and those classes are Immutable tooo.. Like Integer, Charecter, and Double....etc. Do they also have a pool to save memory?? If not, what is special about String to have a pool ? – Punith Raj Feb 02 '15 at 09:21
  • 2
    @PunithRaj I'm not actually sure! I doubt it, however. int, for example, is only 4 bytes, so you don't end up saving that much by having two Integer's point to the same place in memory. On the contrary, having to maintain an 'integer pool' to spot repetitive values is likely to waste more memory than you'll save by avoiding duplicate values. – Nikita Rybak Feb 03 '15 at 10:49
  • 1
    @PunithRaj String is not a primitive data type (techinically/implementation wise) and String does not have a wrapper class like how char/int do. – user3240361 Feb 17 '15 at 10:49
  • 4
    @PunithRaj `String` is not primitive like the other types you give but is often treated as such - so it is rather "special" in the java language. However, java does do a similar optimization with wrapper classes: [If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.](http://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.7) These common values are "pooled" much like `String`s. – ethanbustad Mar 17 '15 at 17:37
  • 1
    Good comment @PunithRaj, you should make it a separate question. – orchidrudra May 31 '15 at 14:11
  • overall 3 different objects are created in string pool area. "a" and "ab" is unreferenced."abc" is referenced object. – Ajay Takur Oct 17 '18 at 14:16
36

I don't think it actually does much, it looks like it's just a cache for string literals. If you have multiple Strings who's values are the same, they'll all point to the same string literal in the string pool.

String s1 = "Arul"; //case 1 
String s2 = "Arul"; //case 2 

In case 1, literal s1 is created newly and kept in the pool. But in case 2, literal s2 refer the s1, it will not create new one instead.

if(s1 == s2) System.out.println("equal"); //Prints equal. 

String n1 = new String("Arul"); 
String n2 = new String("Arul"); 
if(n1 == n2) System.out.println("equal"); //No output.  

http://p2p.wrox.com/java-espanol/29312-string-pooling.html

RENO
  • 1,225
  • 1
  • 10
  • 10
MStodd
  • 4,716
  • 3
  • 30
  • 50
18

When the JVM loads classes, or otherwise sees a literal string, or some code interns a string, it adds the string to a mostly-hidden lookup table that has one copy of each such string. If another copy is added, the runtime arranges it so that all the literals refer to the same string object. This is called "interning". If you say something like

String s = "test";
return (s == "test");

it'll return true, because the first and second "test" are actually the same object. Comparing interned strings this way can be much, much faster than String.equals, as there's a single reference comparison rather than a bunch of char comparisons.

You can add a string to the pool by calling String.intern(), which will give you back the pooled version of the string (which could be the same string you're interning, but you'd be crazy to rely on that -- you often can't be sure exactly what code has been loaded and run up til now and interned the same string). The pooled version (the string returned from intern) will be equal to any identical literal. For example:

String s1 = "test";
String s2 = new String("test");  // "new String" guarantees a different object

System.out.println(s1 == s2);  // should print "false"

s2 = s2.intern();
System.out.println(s1 == s2);  // should print "true"
cHao
  • 84,970
  • 20
  • 145
  • 172
  • I actually don't think it's done at run-time. Even simplest strings constructed with methods won't be pooled. E.g., example from my answer won't work if I use _concat_ instead of _+_ – Nikita Rybak Sep 27 '10 at 06:41
  • 1
    @Nikita: That's because `concat` can't be as easily optimized away. The strings catted together with `+` would likely be pre-catted by any self-respecting compiler, because the value never changes. But the compiler can't really guess whether a function will return the same value all the time (some don't), so it wouldn't try. If you use `concat` instead in your example, "ab", "c", "a", and "bc" would be interned, but "abc" wouldn't (because it's not a literal, and your code doesn't `intern` it). However, with `+` a decent compiler will see that both strings are "abc" and compile that. – cHao Sep 27 '10 at 08:25
  • 1
    The interning would *have* to be done at runtime, cause (1) the pool always starts out empty, and (2) two different classes could each have "abc" in them. If interning were a compile-time thing and both classes ended up being loaded, there'd end up being two "abc"s in the string pool, which defeats the whole purpose of the string pool. – cHao Sep 27 '10 at 08:45
18

Let's start with a quote from the virtual machine spec:

Loading of a class or interface that contains a String literal may create a new String object (§2.4.8) to represent that literal. This may not occur if the a String object has already been created to represent a previous occurrence of that literal, or if the String.intern method has been invoked on a String object representing the same string as the literal.

This may not occur - This is a hint, that there's something special about String objects. Usually, invoking a constructor will always create a new instance of the class. This is not the case with Strings, especially when String objects are 'created' with literals. Those Strings are stored in a global store (pool) - or at least the references are kept in a pool, and whenever a new instance of an already known Strings is needed, the vm returns a reference to the object from the pool. In pseudo code, it may go like that:

1: a := "one" 
   --> if(pool[hash("one")] == null)  // true
           pool[hash("one") --> "one"]
       return pool[hash("one")]

2: b := "one" 
  --> if(pool[hash("one")] == null)   // false, "one" already in pool
        pool[hash("one") --> "one"]
      return pool[hash("one")] 

So in this case, variables a and b hold references to the same object. IN this case, we have (a == b) && (a.equals(b)) == true.

This is not the case if we use the constructor:

1: a := "one"
2: b := new String("one")

Again, "one" is created on the pool but then we create a new instance from the same literal, and in this case, it leads to (a == b) && (a.equals(b)) == false

So why do we have a String pool? Strings and especially String literals are widely used in typical Java code. And they are immutable. And being immutable allowed to cache String to save memory and increase performance (less effort for creation, less garbage to be collected).

As programmers we don't have to care much about the String pool, as long as we keep in mind:

  • (a == b) && (a.equals(b)) may be true or false (always use equals to compare Strings)
  • Don't use reflection to change the backing char[] of a String (as you don't know who is actualling using that String)
Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • If you *do* care about the string pool, there's the potential for massive performance boosts in applications that use a small group of strings extensively, usually as tokens or keywords. Once the strings are interned, comparison becomes a single `==` rather than the function call, two length() calls, and potential bunch of char comparisons that'd happen with `equals`. – cHao Sep 27 '10 at 09:09
  • @cHao For safety and consistency you can still use `String.equals()` with interned strings, because `String.equals()` first does an `==` comparison – bcoughlan Oct 01 '14 at 11:00
  • 1
    @bcoughlan: `==` is as safe and consistent as `equals` -- it's just misunderstood. People who use it with objects in general fall into two categories. There are those who don't understand value vs identity semantics (and that == with reference-types compares identity) -- those people *should* always use `String.equals`. Then there are those who do understand, but are consciously *choosing* identity. And that works just as reliably, as long as you know where your objects came from. There's a reason `==` works with objects -- and in particular, why it doesn't just call `equals`. – cHao Oct 01 '14 at 14:02
  • 1
    @cHao The key is "as long as you know where your objects came from". `if (s1==s2)` looks suspiciously like a bug to most people (and is flagged by FindBugs). I was just pointing out that you can still get the performance boosts of comparisons with String pooling without writing code that assumes strings are interned – bcoughlan Oct 01 '14 at 14:59
  • 1
    @bcoughlan: You can get *some* of the boosts, but you still have a method call. In [my tests](http://ideone.com/e4A5fG), that method call adds significantly -- like +100% -- to the overall run time of the function. And this is in a test intended to be at least a tiny bit realistic. – cHao Oct 01 '14 at 17:23