11

I want to understand the string pool more deeply. Please help me get to the source class file containing this implementation in Java.

The question is more of related to finding the source code or implementation of the String Pool to delve deeper on this concept to know more about some unknown or elusive things in it. This way we can make the use of strings even more efficiently or think of some other way to implement our own garbage collections in case we have an application creating so many literals and string objects.

Ankit_ceo2
  • 317
  • 1
  • 6
  • 14
  • You will not understand the string pool any better by looking at actual code. Basically, it’s a kind of hash map. Specifically, the JVM knows this special map to still support garbage collection for the contained strings. If you understand these principles, there is no need to look at the code. On the other hand, if you don’t understand, the code will not help you. – Holger Feb 19 '16 at 09:02
  • @Holger Ok. I do know about the working of hashmap. was looking for some more "not read" things so far by me by looking at the actual implementation. – Ankit_ceo2 Feb 19 '16 at 09:47

2 Answers2

20

I am sorry to disappoint you but the Java String-Pool is not an actual Java class but somewhere implemented in the JVM i.e. it is writen as C++ code. If you look at the source code of the String class (pretty much all the way down) you see that the intern() method is native. You will have to go through some JVM code to get more information.

Edit: Some implementation can be found here (C++ header, C++ implementation). Search for StringTable.

Edit2: As Holger pointed out in the comments, this is not a hard requirement of the JVM implementation. So it is possible to have a JVM that implements the String Pool differently, e.g. using an actual Java class. Though all commonly used JVMs I am aware of implement it in the JVMs C++ code.

MartinS
  • 2,759
  • 1
  • 15
  • 25
  • Thanks for the information. Yes in the source code the method is native. Can you provide a link that I can download the native source code from. – Ankit_ceo2 Feb 19 '16 at 06:15
  • I was unsure whether it is written in C or C++ ^^ But actually it is C++ so I will fix it. – MartinS Feb 19 '16 at 06:16
  • 1
    @Ankit_ceo2 you should be able to download it here http://download.java.net/openjdk/jdk7/ – MartinS Feb 19 '16 at 06:17
  • 1
    @Ankit_ceo2 I updated my answer with some information where to find it – MartinS Feb 19 '16 at 06:29
  • 1
    Well, there are JVMs written in Java. However, the string pool is a private internal optimization detail of HotSpot, so other JVMs probably won't have it. – Jörg W Mittag Feb 19 '16 at 09:05
  • 2
    @Jörg W Mittag: the fact that equal string literals are represented by the same instance, which is also the same instance returned by `String.intern()` when being invoked on an equal instance, is demanded by the specification. So every JVM must have some kind of string pool or an equivalent. Only the way it actually works, is implementation dependent. – Holger Feb 19 '16 at 10:09
  • @Holger: interesting. I searched the entire JLS and JVMS for the word "pool" and all I got was hits for the constant pool in the JVMS and a reference to a paper on Java Generics Wildcards that was published in the proceedings of the POOL conference in the JLS. – Jörg W Mittag Feb 19 '16 at 10:47
  • 1
    @Jörg W Mittag: Don’t get stuck to the word “pool”. See [JLS §3.10.5. String Literals](https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.5): “*Moreover, a string literal always refers to the* same *instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern*”. As said, it’s up to the JVM how to implement this, but semantically, all string literals always form some kind of pool. – Holger Feb 19 '16 at 11:02
  • 1
    It’s also reflected by the JVM specification in [§5.1. The Run-Time Constant Pool](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.1): “*The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class `String` (JLS §3.10.5). In addition, if the method `String.intern` is called on any string, the result is a `reference` to the same class instance that would be returned if that string appeared as a literal.*” – Holger Feb 19 '16 at 11:05
  • @Holger: Okay, so that basically means that this answer is wrong in claiming that there is no [Java code implementing the string pool](https://kenai.com/projects/maxine/sources/maxine/content/com.oracle.max.vm/src/com/sun/max/vm/classfile/constant/StringConstant.java). – Jörg W Mittag Feb 19 '16 at 11:11
  • 1
    @Jörg W Mittag: that Java code deals with a string constant of a particular class file but not with the global pool implemented by the JVM. The constant pool of a class interacts with the JVM’s global string pool, but these are still distinct pools. Besides that, this answer refers to Oracle’s well-known reference JVM, but doesn’t have to apply to all JVMs of the world. – Holger Feb 19 '16 at 11:19
  • @Holger: Ah, thanks for catching that. This looks promising: https://kenai.com/projects/maxine/sources/maxine/content/com.oracle.max.vm/src/com/sun/max/vm/classfile/constant/SymbolTable.java – Jörg W Mittag Feb 19 '16 at 11:33
  • @Holger thanks for your comments. You are right, I included them in my answer. – MartinS Feb 19 '16 at 17:46
  • @JörgWMittag Maxine is a research VM that is not longer under development. Besides that it is written in Java so the String Pool you are refering to here is still part of the VM and not part of the language/runtime. – MartinS Feb 19 '16 at 17:47
6

You can go through this article: Strings, Literally

When a .java file is compiled into a .class file, any String literals are noted in a special way, just as all constants are. When a class is loaded (note that loading happens prior to initialization), the JVM goes through the code for the class and looks for String literals. When it finds one, it checks to see if an equivalent String is already referenced from the heap. If not, it creates a String instance on the heap and stores a reference to that object in the constant table. Once a reference is made to that String object, any references to that String literal throughout your program are simply replaced with the reference to the object referenced from the String Literal Pool.

So, in the example shown above, there would be only one entry in the String Literal Pool, which would refer to a String object that contained the word "someString". Both of the local variables, one and two, would be assigned a reference to that single String object. You can see that this is true by looking at the output of the above program. While the equals() method checks to see if the String objects contain the same data ("someString"), the == operator, when used on objects, checks for referential equality - that means that it will return true if and only if the two reference variables refer to the exact same object. In such a case, the references are equal. From the above output, you can see that the local variables, one and two, not only refer to Strings that contain the same data, they refer to the same object.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • Very poor quality citation: numerous errors, and the implementation has changed over time. For example, much of the work is done by the compiler, not the JVM, and the JVM examines the constant table in the .class file, not the bytecode. – user207421 Jan 14 '18 at 09:23