0

I'm getting lots of invalid unicode characters from my serial reading and changing the Encoding helped but didn't solved the issue. The message should be "Hello World!0x0D" and half of the time I get "Hello World!" just fine but the other half I get weird unicode characters, like the image below.

I'm using a Industrial controller (PLC) to send the ASCII code: 48h 65h 6Ch 6Ch 6Fh 20h 57h 6Fh 72h 6Ch 64h 21h 0Dh with byte order Most significant bytes first. The ASCII code stands for "Hello World!(CR)" where (CR) means Carriage return If you count the number of invalid chars you will notice that is the same amount of the actual message... Looks like somehow my computer can't understand the messages half of the times.

I tried change the Encoding to BigEndianUnicode, UTF-8, UTF-32, Unicode, ASCII and GetEncoding("Windows-1252"); but it didn't worked. Either I get "\0" before every char or invalid unicode characters half of the time. Could someone shed a light on this matter?

enter image description here

The serial port constructor:

_serialPort = new SerialPort(cbPort.Text);
            _serialPort.BaudRate = Int32.Parse(cbBaudrate.SelectedItem.ToString());
            _serialPort.Parity = Parity.None;
            _serialPort.StopBits = StopBits.One;
            _serialPort.DataBits = 8;
            _serialPort.ReadTimeout = 500;
            _serialPort.Encoding = System.Text.Encoding.BigEndianUnicode;
            _serialPort.Open();

The ReadSerial Event:

private void ReadSerialEvent(object sender, SerialDataReceivedEventArgs er)
    {
        try
        {
            while (_serialPort.BytesToRead > 0)
            {
                read = (char)_serialPort.ReadChar();
                switch (read)
                {
                    case '\r':
                        break;
                    case '\u0d00':
                        ShowSerialData(message);
                        message = "";
                        break;
                    default:
                        message += read;
                        break;
                }
            }

            }
AlexMacabu
  • 127
  • 9
  • 2
    Well what device are you reading from? Do you have documentation indicating that it will *only* provide text data? (My guess is that there's a mixture of text and binary messages, and you're currently interpreting the binary data as if it's text...) – Jon Skeet Mar 10 '22 at 12:55
  • Additionally display a hex representation of the data. Maybe that'll give you a hint of what those characters are. – Velvel Mar 10 '22 at 13:00
  • I'm using a Industrial controller (PLC) to send the ASCII code: 48h 65h 6Ch 6Ch 6Fh 20h 57h 6Fh 72h 6Ch 64h 21h 0Dh with byte order Most significant bytes first. The ASCII code stands for "Hello World!(CR)" where (CR) means Carriage return If you count the number of invalid chars you will notice that is the same amount of the actual message... Looks like somehow my computer can't understand the messages half of the times. – AlexMacabu Mar 10 '22 at 13:31
  • `'\u0d00'` can't be represented using a single byte. It's a single Unicode character that requires *tree* bytes in the UTF8 encoding. `ReadChar` reads *one or more bytes* from the serial port and converts them into a character using the encoding specified in the `Encoding` property. – Panagiotis Kanavos Mar 10 '22 at 13:41
  • 1
    Instead of reading one character at a time, read the *bytes* into a buffer, checking whether the end-of-message bytes appeared and convert the entire buffer when a message is complete using the appropriate encoding. This uses less memory too. Even if you use `ReadChar` you'll have to read two characters and compare them to `x0D` and `x00` to detect the end of a message – Panagiotis Kanavos Mar 10 '22 at 13:49
  • 2
    `System.Text.Encoding.BigEndianUnicode;` is definitely the wrong encoding here. UTF16-BE uses *two* bytes per letter (or more) so `0x8h 0x65` will be converted to a *single* character. – Panagiotis Kanavos Mar 10 '22 at 13:52
  • @PanagiotisKanavos Thank you! I will try this and let you know if it works – AlexMacabu Mar 10 '22 at 13:52
  • Almost over! I've changed my ReadSerialEvent to: `try { int bytes = _serialPort.BytesToRead; byte[] buffer = new byte[bytes]; _serialPort.Read(buffer, 0, bytes); var dataAsString = System.Text.Encoding.BigEndianUnicode.GetString(buffer); ShowSerialData(dataAsString);` And now I'm only getting 1 invalid char after my string! Almost there! – AlexMacabu Mar 10 '22 at 14:44
  • The following may be helpful: https://stackoverflow.com/questions/70441938/get-scale-weight-over-serial-from-mettler-toledo-rice-lake-scale/70614758#70614758 , https://stackoverflow.com/questions/65957066/serial-to-usb-cable-from-a-scale-to-pc-some-values-are-just-question-marks/65971845#65971845 , and https://learn.microsoft.com/en-us/dotnet/api/system.io.ports.serialport.dtrenable?view=netframework-4.8 – Tu deschizi eu inchid Mar 10 '22 at 15:28
  • 1
    It's a [mojibake](https://en.wikipedia.org/wiki/Mojibake) case (example in Python): `print( "Hello World!\x0D".encode( 'utf_16_be').decode( 'utf_16'))` returns `䠀攀氀氀漀 圀漀爀氀搀℀`. You should insist on (default) `_serialPort.Encoding = System.Text.ASCIIEncoding;`. – JosefZ Mar 10 '22 at 16:34

0 Answers0