2

I'm writing some binary protocol messages in .Net using strings, and it mostly works, except for one particular case.

The message I'm trying to send is:

String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";  
myDevice.Write(Encoding.ASCII.GetBytes(cmdPacket));

(to help decode, those bytes are 253, 11, 22, then the ASCII chars: "MBEPEXE1.").

Except when I do the Encoding.ASCII.GetBytes, the 0xFD comes out as byte 0x3F (value 253 changed to 63).

(I should point out that the \x0B and \x16 are interpreted correctly as Hex 0B & Hex 16)

I've also tried Encoding.UTF8 and Encoding.UTF7, to no avail.

I feel there is probably a good simple way to express values above 128 in Strings, and convert them to bytes, but I'm missing it.

Any guidance?

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
abelenky
  • 63,815
  • 23
  • 109
  • 159
  • 3
    Technically character values larger than 127 are not ASCII characters. :) – Some programmer dude Aug 08 '13 at 16:37
  • @JoachimPileborg: Acknowledged. But they're valid byte-values. Just trying to figure out how to put them in strings, then get them as byte-values for transmission. – abelenky Aug 08 '13 at 16:37
  • @abelenky: Then it's pretty confusing that you have this question tagged as ASCII, and you're trying to use `Encoding.ASCII.GetBytes`, when you know it's not ASCII-compliant. – Andrew Coonce Aug 08 '13 at 16:39
  • You're passing that part of the string to `GetBytes` which is of course going to interpret them as normal characters (not hex values). I think you should just deal with raw byte values differently. IE instantiate a `List`, add the raw hex values to it, use `GetBytes` on the normal text, add the result to the list, do any conversion necessary (ToArray) and pass that to `Write` – evanmcdonnal Aug 08 '13 at 16:41
  • @AndrewCoonce: Changed title to say I'm working with Byte values. – abelenky Aug 08 '13 at 16:46
  • @Someprogrammerdude, Aren't characters above 127 called Extended Ascii characters? Because I'm in same situation, I need to know what should I call [DOS code pages](https://en.wikipedia.org/wiki/Category:DOS_code_page)? They include characters above 127 too. e.g. translating from `dos code page 720`. These code pages are categorized under 8-bit code in [Wikipedia ASCII article](https://en.wikipedia.org/wiki/ASCII) – AaA May 02 '17 at 09:20

4 Answers4

4

Ignoring if it's good or bad what you are doing, the encoding ISO-8859-1 maps all its characters to the characters with the same code in Unicode.

// Bytes with all the possible values 0-255
var bytes = Enumerable.Range(0, 256).Select(p => (byte)p).ToArray();

// String containing the values
var all1bytechars = new string(bytes.Select(p => (char)p).ToArray());

// Sanity check
Debug.Assert(all1bytechars.Length == 256);

// The encoder, you could make it static readonly
var enc = Encoding.GetEncoding("ISO-8859-1"); // It is the codepage 28591

// string-to-bytes
var bytes2 = enc.GetBytes(all1bytechars);

// bytes-to-string
var all1bytechars2 = enc.GetString(bytes);

// check string-to-bytes
Debug.Assert(bytes.SequenceEqual(bytes2));

// check bytes-to-string
Debug.Assert(all1bytechars.SequenceEqual(all1bytechars2));

From the wiki:

ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.

Or a simple and fast method to convert a string to a byte[] (with unchecked and checked variant)

public static byte[] StringToBytes(string str)
{
    var bytes = new byte[str.Length];

    for (int i = 0; i < str.Length; i++)
    {
        bytes[i] = checked((byte)str[i]); // Slower but throws OverflowException if there is an invalid character
        //bytes[i] = unchecked((byte)str[i]); // Faster
    }

    return bytes;
}
xanatos
  • 109,618
  • 12
  • 197
  • 280
2

ASCII is a 7-bit code. The high-order bit used to be used as a parity bit, so "ASCII" could have even, odd or no parity. You may notice that 0x3F (decimal 63) is the ASCII character ?. That is what non-ASCII octets (those greater than 0x7F/decimal 127) are converted to by the CLR's ASCII encoding. The reason is that there is no standard ASCII character representation of the code points in the range 0x80–0xFF.

C# strings are UTF-16 encoded Unicode internally. If what you care about are the byte values of the strings, and you know that the strings are, in fact, characters whose Unicode code points are in the range U+0000 through U+00FF, then its easy. Unicode's first 256 codepoints (0x00–0xFF), the Unicode blocks C0 Controls and Basic Latin (\x00-\x7F) and C1 Controls and Latin Supplement (\x80-\xFF) are the "normal" ISO-8859-1 characters. A simple incantation like this:

String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";  
byte[] buffer = cmdPacket.Select(c=>(byte)c).ToArray() ;
myDevice.Write(buffer);

will get you the byte[] you want, in this case

// \xFD   \x0B   \x16   M      B      E     P      E      X      E      1      .
[  0xFD , 0x0B , 0x16 , 0x4d , 0x42 , 0x45, 0x50 , 0x45 , 0x58 , 0x45 , 0x31 , 0x2E ]
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
1

With LINQ, you could do something like this:

String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";  
myDevice.Write(cmdPacket.Select(Convert.ToByte).ToArray());

Edit: Added an explanation

First, you recognize that your string is really just an array of characters. What you want is an "equivalent" array of bytes, where each byte corresponds to a character.

To get the array, you have to "map" each character of the original array as a byte in the new array. To do that, you can use the built-in System.Convert.ToByte(char) method.

Once you've described your mapping from characters to bytes, it's as simple as projecting the input string, through the mapping, into an array.

Hope that helps!

Andrew Coonce
  • 1,557
  • 11
  • 19
0

I use Windows-1252 as it seems to give the most bang for the byte
And is compatible with all .NET string values
You will probably want to comment out the ToLower
This was built for compatibility with SQL char (single byte)

namespace String1byte
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
            String8bit s1 = new String8bit("cat");
            String8bit s2 = new String8bit("cat");
            String8bit s3 = new String8bit("\xFD\x0B\x16MBEPEXE1.");
            HashSet<String8bit> hs = new HashSet<String8bit>();
            hs.Add(s1);
            hs.Add(s2);
            hs.Add(s3);
            System.Diagnostics.Debug.WriteLine(hs.Count.ToString());
            System.Diagnostics.Debug.WriteLine(s1.Value + " " + s1.GetHashCode().ToString());
            System.Diagnostics.Debug.WriteLine(s2.Value + " " + s2.GetHashCode().ToString());
            System.Diagnostics.Debug.WriteLine(s3.Value + " " + s3.GetHashCode().ToString());
            System.Diagnostics.Debug.WriteLine(s1.Equals(s2).ToString());
            System.Diagnostics.Debug.WriteLine(s1.Equals(s3).ToString());
            System.Diagnostics.Debug.WriteLine(s1.MatchStart("ca").ToString());
            System.Diagnostics.Debug.WriteLine(s3.MatchStart("ca").ToString());
        }
    }

    public struct String8bit
    {
        private static Encoding EncodingUnicode = Encoding.Unicode;
        private static Encoding EncodingWin1252 = System.Text.Encoding.GetEncoding("Windows-1252");
        private byte[] bytes;
        public override bool Equals(Object obj)
        {
            // Check for null values and compare run-time types.
            if (obj == null) return false;
            if (!(obj is String8bit)) return false;
            String8bit comp = (String8bit)obj;
            if (comp.Bytes.Length != this.Bytes.Length) return false;
            for (Int32 i = 0; i < comp.Bytes.Length; i++)
            {
                if (comp.Bytes[i] != this.Bytes[i])
                    return false;
            }
            return true;
        }
        public override int GetHashCode()
        {
            UInt32 hash = (UInt32)(Bytes[0]); 
            for (Int32 i = 1; i < Bytes.Length; i++) hash = hash ^ (UInt32)(Bytes[0] << (i%4)*8);
            return (Int32)hash;
        }
        public bool MatchStart(string start)
        {
            if (string.IsNullOrEmpty(start)) return false;
            if (start.Length > this.Length) return false;
            start = start.ToLowerInvariant();   // SQL is case insensitive
            // Convert the string into a byte array
            byte[] unicodeBytes = EncodingUnicode.GetBytes(start);
            // Perform the conversion from one encoding to the other 
            byte[] win1252Bytes = Encoding.Convert(EncodingUnicode, EncodingWin1252, unicodeBytes);
            for (Int32 i = 0; i < win1252Bytes.Length; i++) if (Bytes[i] != win1252Bytes[i]) return false;
            return true;
        }
        public byte[] Bytes { get { return bytes; } }
        public String Value { get { return EncodingWin1252.GetString(Bytes); } }
        public Int32 Length { get { return Bytes.Count(); } }
        public String8bit(string word)
        {
            word = word.ToLowerInvariant();     // SQL is case insensitive
            // Convert the string into a byte array 
            byte[] unicodeBytes = EncodingUnicode.GetBytes(word);
            // Perform the conversion from one encoding to the other 
            bytes = Encoding.Convert(EncodingUnicode, EncodingWin1252, unicodeBytes);
        }
        public String8bit(Byte[] win1252bytes)
        {   // if reading from SQL char then read as System.Data.SqlTypes.SqlBytes
            bytes = win1252bytes;
        }
    }
}
paparazzo
  • 44,497
  • 23
  • 105
  • 176