different jvms with different encodings

Question

Suppose I have 2 jvms running - 1 is a client and the other is a server. Suppose the client and server are using different encodings. If I write a program on the client which sends Strings across the network to the server, is it necessary to encode the String in the client in the server's encoding before the client sends it across to the server? Would this be pointless if the 2 are using different encodings in the first place? How do clients and servers handle scenarios typically where they are exchanging messages where both are using different encodings?

anonymous · Accepted Answer · 2014-02-27T22:02:04.180

I suppose you are encountering what is called platform default encoding. For example, when converting bytes into String using new String(byte[]), the default encoding is used to convert bytes to String. Different servers may have different setup that have a different default platform encoding.

To prevent different behaviour of the servers due to different default encoding, specify the encoding to use when converting bytes[] to String. If you don't know the encoding to use, that is another matter but at least you get consistent results for the same byte stream.

For example, to convert String to UTF-8 byte stream use getBytes("UTF-8") and to get back the String, use String(byte[],"UTF-8");

score 1 · Answer 2 · edited May 23 '17 at 12:16

1

JVMs always use UTF in Strings (read this answer).

The critical part is the transmission of the String which is likely to happen on a byte-based stream. Converting a String to a byte[] actually requires you to specify the encoding. You should use utf-8 in most cases.

// On the client side
byte[] bytes = myString.getBytes("UTF-8");
serverStream.write(bytes);
// On the server side
byte[] bytes = /* read bytes */;
String myString = new String(bytes, "UTF-8");

I suggest using a DataOutputStream/DataInputStream which provide methods for charset-safe String transmissions.

edited May 23 '17 at 12:16

Community

1
1

answered Feb 27 '14 at 21:50

Tobias

7,723
1
27
44

Clear answer, though DataInputStream and DataOutputStream is more for I/O of Java objects; a bit abused for String I/O. Better use `new InputStreamReader(InputStream, "UTF-8")` and `OutputStreamWriter(OutputStream, "UTF-8")`. – Joop Eggen Feb 27 '14 at 22:07
@still_learning What do you mean when you say JVM always use UTF in Strings. Are you saying that JVM always uses UTF to convert bytes to String? I certainly have experienced a different default encoding in different servers. So if you have a code new String(byte[]), this code can yield different String in different servers even for the same byte stream. – anonymous Feb 27 '14 at 22:07
Strings in Java are always in Unicode, so can represent all characters, `byte[]` is binary data. In Java one hence has to say what encoding the bytes are in to convert between bytes and String. Unfortunately often I/O functions has a method version without encoding where the default encoding of the operating system is taken. – Joop Eggen Feb 27 '14 at 22:12
@JoopEggen Absolutely true as long as you just want to read `String`s (or `char`s). And thanks for explaining my answer to @anonymous. – Tobias Feb 28 '14 at 09:01

different jvms with different encodings

2 Answers2