That is not how variable assignments work
Thinking that assigning a 6 byte array to a variable will limit the length of any other arrays assigned to the same variable show a fundamental lack of comprehension on what variable are and how they work.
Really think about why you think assigning a variable to a fixed length array would limit the length of being assigned to another length array?
Strings are Unicode in Java
Strings in Java are Unicode and internally represented as UTF-16 which means they are 2 or 4 bytes per character in memory.
When they are converted to a byte array the number of bytes that represents the string is determined by what encoding is used when converting to the byte[]
.
Always specify an appropriate character encoding when converting Strings to arrays to get what you expect.
But even then UTF-8 would not guarantee single bytes per character, and ASCII
would be not be able to represent non ASCII
Unicode characters.
Character encoding is tricky
The ubiquitous internet encoding standard is UTF-8
it will correct in 99.9999999% of all cases, in those cases it isn't converting UTF-8
to the correct encoding is trivial because UTF-8
is so well supported in every toolchain.
Learn to make everything final
and you will a lot easier time and less confusion.
import com.google.common.base.Charsets;
import javax.annotation.Nonnull;
import java.util.Arrays;
public class Scratch
{
public static void main(final String[] args)
{
printWithEncodings("Hello World!");
printWithEncodings("こんにちは世界!");
}
private static void printWithEncodings(@Nonnull final String s)
{
System.out.println("s = " + s);
final byte[] defaultEncoding = s.getBytes(); // never do this, you do not know what you will get!
// for ASCII characters the first three will all be the same single byte representations
final byte[] iso88591Encoding = s.getBytes(Charsets.ISO_8859_1);
final byte[] asciiEncoding = s.getBytes(Charsets.US_ASCII);
final byte[] utf8Encoding = s.getBytes(Charsets.UTF_8);
final byte[] utf16Encoding = s.getBytes(Charsets.UTF_16);
System.out.println("Arrays.toString(defaultEncoding) = " + Arrays.toString(defaultEncoding));
System.out.println("Arrays.toString(iso88591) = " + Arrays.toString(iso88591Encoding));
System.out.println("Arrays.toString(asciiEncoding) = " + Arrays.toString(asciiEncoding));
System.out.println("Arrays.toString(utf8Encoding) = " + Arrays.toString(utf8Encoding));
System.out.println("Arrays.toString(utf16Encoding) = " + Arrays.toString(utf16Encoding));
}
}
results in
s = Hello World!
Arrays.toString(defaultEncoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(iso88591) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(asciiEncoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(utf8Encoding) = [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Arrays.toString(utf16Encoding) = [-2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33]
s = こんにちは世界!
Arrays.toString(defaultEncoding) = [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -127, -81, -28, -72, -106, -25, -107, -116, 33]
Arrays.toString(iso88591) = [63, 63, 63, 63, 63, 63, 63, 33]
Arrays.toString(asciiEncoding) = [63, 63, 63, 63, 63, 63, 63, 33]
Arrays.toString(utf8Encoding) = [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -127, -81, -28, -72, -106, -25, -107, -116, 33]
Arrays.toString(utf16Encoding) = [-2, -1, 48, 83, 48, -109, 48, 107, 48, 97, 48, 111, 78, 22, 117, 76, 0, 33]
Always specify the Charset encoding!
.bytes(Charset)
is always the correct way to convert a String
to bytes. Use whatever encoding you need.
Internally supported encodings for JDK7