2

I read quite some answers about the String.intern() method but didn't quite get my answer. The intern method is a native method with the docs saying :

Returns a canonical representation for the string object.

Now String is a wrapper around the char[] array. When we define using literals, the reference is direct to the underlying char[].

When we create using - new String(), it creates an object on the heap and one part of the object hold the "value", which internally refers to the underlying char[] Array.

My questions are :

  • Is the char[] being cached as a HashSet? Hence when pooling, there is no duplicate char[] ?
  • The intern() method when called - String s = new String().intern(), the local reference now directly points/refernces to the char[] and the String type object created is not having a live reference ? (my assumptions based on - "hello" == "hello".intern() = true)

The question rather is - How does the intern() method work internally.

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
Priyak Dey
  • 1,227
  • 8
  • 20
  • 1
    You can imagine it as some kind of hash-set, yes. The string pool implementation is up to the JVM and internal. So different JVMs might have it implemented differently. But yes, if you create a string via literal `"hello"` (or using `intern()`) it is put into that cache, like with a hash-set, `pool.add(...)`. `intern()` will not only put your string into the cache but, if it is already in the cache, will instead return the one from the cache. – Zabuzard Jan 31 '20 at 15:27
  • 1
    So if you use literals or `intern()`, it will first check the internal string cache and if it is already there, it will give you exactly that, out of the cache. Otherwise it puts it into the cache, for the next user to benefit from it. – Zabuzard Jan 31 '20 at 15:29
  • 1
    Note that in Java, strings are not just a simple wrapper around a `char[]`, unlike C/C++. There is more going on internally. It also has two different internal representations which it can switch around (dealing with different kinds of encodings). – Zabuzard Jan 31 '20 at 15:30
  • The part where you mentioned about different impl/representation - that is implemented in Java 9 right? I am actually still working on Java 8 and talking in reference of that. Ill update the question in jdk version is required – Priyak Dey Jan 31 '20 at 15:33
  • Recently I did a small example to check the working and refernces. String s1 = "hello"; String s2 = new String("hello"); As expected s1 == s2 = false – Priyak Dey Jan 31 '20 at 15:34
  • now I used reflection to get the reference to the underlying char[] array holding the data. Both s1 and s2 refer to the same underlying char[]. Which tells me, there are 3 two objects in memory now - 1. A String type Object referenced by s2. 2. A char type array object holding the actual value, referenced by s1 from the thread stack and the "value" field defined in s2. So essentially there is just 1 char[] in the heap right now. – Priyak Dey Jan 31 '20 at 15:37
  • 2
    You can lookup the implementation, Java is open-source. And, as said, it doesnt store them as `char[]` but as `byte[]`, with a corresponding encoding (like UTF-16 etc). And yes, the "two implementations" (UTF-16 or Latin-1) came with Java 9. Here is the source: [String.java](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java). Line 150 shows the internal data representation, `private final byte[] value;`. And `intern` is natively implemented, `public native String intern();`. – Zabuzard Jan 31 '20 at 15:43
  • 2
    `intern()` is a `native` method, so we don't know how it works, and it may work differently in different JVM implementations/versions. – Bohemian Jan 31 '20 at 15:53
  • 1
    @Zabuza- This is open-jdk. I am using Oracle JDK. I normally check the sc from the rt.jar itself. In 1.8 version it is still using char[]: /** The value is used for character storage. */ private final char value[]; – Priyak Dey Jan 31 '20 at 16:14
  • 1
    @Bohemian Thanks for the link. I think I got somewhat of what is happening. The pool was maintained in the PermGem previously and now being maintained on the heap. That kind of answers my questions by both literal and the String object value reference to the same array. Thanks, that actually helps a bit to answer the question. :) – Priyak Dey Jan 31 '20 at 16:18
  • 1
    By the way, if you want a more in-depth answer you should post a new question, with a title more on-topic. And then concentrate on what you actually want to know. As in, _"how does `intern()` work internally"_, or _"how is `intern()` implemented in Oracle JDK"_, or _"how is string interning implemented internally in Oracle JDK"_. Then your question also will not be closed for being a duplicate. – Zabuzard Feb 01 '20 at 10:32
  • Thanks @Holger. That was the answer I was looking fir.Thanks :) – Priyak Dey Feb 03 '20 at 19:32

0 Answers0