1) Strings are objects, which typically contain a char
array and the strings's length. The character array is usually implemented as a contiguous array of 16-bit words, each one containing a Unicode character in native byte order.
2) Assigning a character value to an integer converts the 16-bit Unicode character code into its integer equivalent. Thus 'c'
, which is U+0063, becomes 0x0063
, or 99.
3) Since each String
is an object, it contains other information than its class members (e.g., class descriptor word, lock/semaphore word, etc.).
ADENDUM
The object contents depend on the JVM implementation (which determines the inherent overhead associated with each object), and how the class is actually coded (i.e., some libraries may be more efficient than others).
EXAMPLE
A typical implementation will allocate an overhead of two words per object instance (for the class descriptor/pointer, and a semaphore/lock control word); a String
object also contains an int
length and a char[]
array reference. The actual character contents of the string are stored in a second object, the char[]
array, which in turn is allocated two words, plus an array length word, plus as many 16-bit char
elements as needed for the string (plus any extra chars that were left hanging around when the string was created).
ADDENDUM 2
The case that one char represents one Unicode character is only true in most of the cases. This would imply UCS-2 encoding and true before 2005. But by now Unicode has become larger and Strings have to be encoded using UTF-16 -- where alas a single Unicode character may use two char
s in a Java String
.
Take a look at the actual source code for Apache's implementation, e.g. at:
http://www.docjar.com/html/api/java/lang/String.java.html