Trying to throw more light for future visitors.
Rule of thumb: Server and client HAS TO sync between encoding scheme, because if client is sending data encoded using some encoding scheme and server is reading the data using other encoding scheme, then exepcted results can NEVER be achieved.
Important thing to note for the folks who try to test this is that do not encoded in ASCII at client side (or in other words using ASCII encoding at client side) and decode using UTF8 at server side (or in other words using UTF8 encoding at server side) because UTF8 is backward compatible with ASCII, so may feel that "Rule of thumb" is wrong, but no, its not, so better use UTF8 at client side and UTF16 at server side and you will understand.
Encoding with sockets
I guess single most important thing to understand is: finally over the socket you are going to send BYTES but it all depends how those bytes are encoded.
For example, if I send input to server (over client-server socket) using my windows command prompt then the data will be encoded using some encoding scheme (I really do not know which), and if I send data to server using another client code/program then I can specify the encoding scheme which I want to use for my client socket’s o/p stream, and then all the data will be converted/encoded into BYTES using that encoding scheme and sent over the socket.
Now, finally I am still sending the BYTES over the wire but those are encoded using the encoding scheme which I specified. And if suppose at server side, I use another encoding scheme while reading over the socket’s i/p stream then expected results cannot be achieved, and if I use same encoding scheme (same as client’s encoding scheme) at server as well then everything will be perfect.
Answering this question
In Java, there are special "bridge" streams (read here) which you can use to specify encoding of the stream.
PLEASE NOTE: in Java InputStream
and OutputStream
are BYTE streams, so everything read from and written into using these streams will be BYTES, you cannot specify encoding using objects of InputStream
and OutputStream
classes, so you can use Java bridge classes.
Below is the code snippet of client and server, where I am trying to show how to specify encoding over the client's output stream and server's input stream.
As long as I specify same encoding at both end, everything will be perfect.
Client side:
Socket clientSocket = new Socket("abc.com", 25050);
OutputStreamWriter clientSocketWriter = (new OutputStreamWriter(clientSocket.getOutputStream(), "UTF8"));
Server side:
ServerSocket serverSocket = new ServerSocket(8001);
Socket clientSocket = serverSocket.accept();
// PLEASE NOTE: important thing below is I am specifying the encoding over my socket's input stream, and since Java's <<InputStream>> is a BYTE stream,
// so in order to specify the encoding I am using Java I/O's bridge class <<InputStreamReader>> and specifying my UTF8 encoding.
// So, with this all my data (BYTES really) will be read from client socket as bytes "BUT" those will be read as UTF8 encoded bytes.
// Suppose if I specify different encoding here, than what client is specifying in its o/p stream than data cannot read properly and may be all "?"
InputStreamReader clientSocketReader = (new InputStreamReader(clientSocket.getInputStream(), "UTF8"));