8

I'm trying to send a string containing special characters through a TcpClient (byte[]). Here's an example:

  • Client enters "amé" in a textbox
  • Client converts string to byte[] using a certain encoding (I've tried all the predefined ones plus some like "iso-8859-1")
  • Client sends byte[] through TCP
  • Server receives and outputs the string reconverted with the same encoding (to a listbox)

Edit :

I forgot to mention that the resulting string was "am?".

Edit-2 (as requested, here's some code):

@DJKRAZE here's a bit of code :

byte[] buffer = Encoding.ASCII.GetBytes("amé");
(TcpClient)server.Client.Send(buffer);

On the server side:

byte[] buffer = new byte[1024];
Client.Recieve(buffer);
string message = Encoding.ASCII.GetString(buffer);
ListBox1.Items.Add(message);

The string that appears in the listbox is "am?"

=== Solution ===

Encoding encoding = Encoding.GetEncoding("iso-8859-1");
byte[] message = encoding.GetBytes("babé");

Update:

Simply using Encoding.Utf8.GetBytes("ééé"); works like a charm.

Philippe Paré
  • 4,279
  • 5
  • 36
  • 56
  • Philippe do you have existing code.. ? why is it that people ask questions online here and expect us to know what it is they are talking about..? we can't see what you are doing nor do we know what your code looks like.. so post what it is you are working with... – MethodMan Feb 26 '13 at 05:57
  • @DJKRAZE here's a bit of code : byte[] buffer = Encoding.ASCII.GetBytes("amé"); (TcpClient)server.Client.Send(buffer); On the server side: byte[] buffer = new byte[1024]; Client.Recieve(buffer); string message = Encoding.ASCII.GetString(buffer); ListBox1.Items.Add(message); The string that appears in the listbox is "am?" – Philippe Paré Feb 26 '13 at 06:00
  • 2
    ASCII will not do here - it doesn't support accented characters. Try UTF-8 instead. – 500 - Internal Server Error Feb 26 '13 at 06:09
  • @500-InternalServerError tried all predefined ones, including utf-8.. :S – Philippe Paré Feb 26 '13 at 06:11
  • Does it work with UTF-8 if you strip out the middle man (the socket connection)? – 500 - Internal Server Error Feb 26 '13 at 06:21

3 Answers3

11

Never too late to answer a question I think, hope someone will find answers here.

C# uses 16 bit chars, and ASCII truncates them to 8 bit, to fit in a byte. After some research, I found UTF-8 to be the best encoding for special characters.

//data to send via TCP or any stream/file
byte[] string_to_send = UTF8Encoding.UTF8.GetBytes("amé");

//when receiving, pass the array in this to get the string back
string received_string = UTF8Encoding.UTF8.GetString(message_to_send);
Philippe Paré
  • 4,279
  • 5
  • 36
  • 56
  • 1
    You said [here](http://stackoverflow.com/questions/15082285/sending-a-string-containing-special-characters-through-a-tcpclient-byte#comment21213160_15082285) that you tried that already and it did not work. What changed? – Scott Chamberlain Sep 09 '14 at 04:01
  • No. C#'s `char` data type holds one UTF-16 code unit, one or two of which encode a Unicode codepoint. UTF-8 encodes a Unicode codepoint in 1 to 4 bytes. It doesn't matter which encoding you use as long as you use the same on both sides and the encoding does not cause you to loose data by not being able to represent the characters you need. If it can't, GetBytes() will take some action. The standard action is to substitute "?"; Throwing an exception is also common; Truncation is not common but you could code it that way if you wanted to cause data corruption. – Tom Blodget Sep 09 '14 at 04:22
  • Scott, clearly I had something else wrong about the code. Utf-8 encoding works perfectly when used on both sides. I updated the question so that people don't get misled with me saying the utf-8 doesn't work. – Philippe Paré Sep 09 '14 at 11:50
  • Tom, what I meant to say is that however C# stores the char itself, it's 2 bytes and therefor, ascii doesn't help with spécial characters like "é" – Philippe Paré Sep 09 '14 at 11:52
  • @PhilippeParé and what Tom is saying is C# uses UTF-16 internally which could be 2 or 4 bytes in size. For example `U+1D11E` ([MUSICAL SYMBOL G CLEF](http://en.wikipedia.org/wiki/UTF-16#Examples)) is representable but it would be the four bytes `D8 34 DD 1E` in memory. – Scott Chamberlain Sep 09 '14 at 13:15
  • that's interesting! never saws that happen, I guess it would store all chars as 4 bytes when only one of the chars in the string uses let'S say I+1D11E ? – Philippe Paré Sep 09 '14 at 13:30
5

Your problem appears to be the Encoding.ASCII.GetBytes("amé"); and Encoding.ASCII.GetString(buffer); calls, as hinted at by '500 - Internal Server Error' in his comments.

The é character is a multi-byte character which is encoded in UTF-8 with the byte sequence C3 A9. When you use the Encoding.ASCII class to encode and decode, the é character is converted to a question mark since it does not have a direct ASCII encoding. This is true of any character that has no direct coding in ASCII.

Change your code to use Encoding.UTF8.GetBytes() and Encoding.UTF8.GetString() and it should work for you.

Corey
  • 15,524
  • 2
  • 35
  • 68
0

Your question and your error is not clear to me but using Base64String may solve the problem
Something like this

static public string EncodeTo64(string toEncode)
    {
      byte[] toEncodeAsBytes
            = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
      string returnValue
            = System.Convert.ToBase64String(toEncodeAsBytes);
      return returnValue;
    }

static public string DecodeFrom64(string encodedData)
    {
      byte[] encodedDataAsBytes
          = System.Convert.FromBase64String(encodedData);
      string returnValue =
         System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
      return returnValue;
    }
Mohsen Heydari
  • 7,256
  • 4
  • 31
  • 46
  • Tried implementing this, not working... I get errors saying that the string is not in base64... – Philippe Paré Feb 26 '13 at 06:38
  • 1
    Alright! Found a way around this huge problem. I'm now using the "iso-8859-1" encoding. Here's a bit of code for anyone interested in the future. Encoding encoding = Encoding.GetEncoding("iso-8859-1"); byte[] message = encoding.GetBytes("babé"); The result server side : "babé" ! Thanks anyways for all the answers :) – Philippe Paré Feb 26 '13 at 06:48