0

I am working a problem in C# and I am having issues with converting my string of multiple hex values to a byte[].

string word = "\xCD\x01\xEF\xD7\x30";

(\x starts each new value, so I have: CD 01 EF D7 30)

This is my first time asking a question here, so please let me know if you need anything extra from me.

More information on the project:

I need to be able to change both

"apple" and "\xCD\x01\xEF\xD7\x30" to a byte array.

For the normal string "apple" I use

byte[] data = Encoding.ASCII.GetBytes(word);

this does not seem to be working with "\xCD\x01\xEF\xD7\x30" I am getting the values

63, 1, 63, 63, 48 
xanatos
  • 109,618
  • 12
  • 197
  • 280
Tyler
  • 11
  • 1
  • 3
    "byte array" isn't a format. It is a container. To transform a string in a byte array you must choose a format, an Encoding. `Encoding.UTF8.GetBytes(...)` or `Encoding.Unicode.GetBytes(...)` for example. You can even use `Encoding.GetEncoding("iso-8859-1").GetBytes(...)` to have an "identical" encoding to the first byte of each unicode character. Ah... and please forget the word ASCII ever existed. Trust me on this. – xanatos Jan 04 '21 at 17:39
  • `Encoding.ASCII` refers to the 7-bit US-ASCII that only accepts values up to 123/`7F`. The *escape sequences* you posted go beyond that. `63` is `?`, the replacement character used when invalid values are encountered. What you posted isn't what you think in any case. .NET strings are UTF16 so `\xCD` refers to a *16-bit* value, whose first byte is `00` and second byte is `CD`. – Panagiotis Kanavos Jan 04 '21 at 17:42
  • What do you expect to get after the conversion to bytes? You can already access the 16-bit Char objects in that string, you don't need an explicit encoding. If you want an exact conversion, you need `Encoding.Unicode`, which will convert each `Char` to two bytes (or four)`. If you use `Encoding.UTF8`, *all* characters in the ASCII range will be converted to one byte, all characters outside it to two or more bytes – Panagiotis Kanavos Jan 04 '21 at 17:44
  • If you wanted to use that `word` as a way to store hex values, it's not a good way to do it. You'll have to find an encoding that allows *all* those values and use it with `Encoding.GetEncoding()`. The [Latin1/ISO-8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1) encoding is missing about 33 characters that would be replaced with `63`, including `01` – Panagiotis Kanavos Jan 04 '21 at 17:47
  • Does this answer your question? [How do you convert a byte array to a hexadecimal string, and vice versa?](https://stackoverflow.com/questions/311165/how-do-you-convert-a-byte-array-to-a-hexadecimal-string-and-vice-versa) – Charlieface Jan 04 '21 at 17:49
  • @PanagiotisKanavos Not exactly true... From the same page _In 1990, the very first version of Unicode used the code points of ISO-8859-1 as the first 256 Unicode code points._ and in fact the ISO-8859 encoding in .NET maps the unicode 0x00-0xFF to the byte codes 0x00-0xFF and back. The trick is in the next sentence: _In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1.... thus provides for 256 characters via every possible 8-bit value._ – xanatos Jan 04 '21 at 18:06
  • @xanatos indeed, but I still wouldn't use such a string to store bytes. It's far slower than using a `byte[]` with hex literals and takes 4x the space – Panagiotis Kanavos Jan 04 '21 at 18:18
  • @Charlieface note that that Q, i think, concerns itself with e.g. converting the 10 char string `"CD01EFD730"` to a 5 byte array `{ 0xCD, 0x01, 0xEF, 0xD7, 0x30 }` - this differs slightly – Caius Jard Jan 04 '21 at 19:08
  • Thanks for all the fast responses! Xanatos worked the easiest, just removing Ascii from the equation. I went with Encoding.GetEncoding("iso-8859-1").GetBytes(...) and it works in both directions for me! – Tyler Jan 04 '21 at 20:36

1 Answers1

0

Ok... You were trying to directly "downcast"/"upcast" char <-> byte (where char is the C# char that is 16 bits long, and byte is 8 bits long).

There are various ways to do it. The simplest (probably not the more performant) is to use the iso-8859-1 encoding that "maps" the byte values 0-255 to the unicode codes 0-255 (and return).

Encoding enc = Encoding.GetEncoding("iso-8859-1");

string str = "apple";
byte[] bytes = enc.GetBytes(str);
string str2 = enc.GetString(bytes);

You can even do a little LINQ:

string str = "apple";

// This is "bad" if the string contains codepoints > 255
byte[] bytes = str.Select(x => (byte)x).ToArray();

// This is always safe, because by definition any value of a byte
// is a legal unicode character
string str2 = string.Concat(bytes.Select(x => (char)x));
xanatos
  • 109,618
  • 12
  • 197
  • 280