0

How do I convert from 8 bit byte to 7 bit byte (Base 256 to Base 128)

I am looking to do something like this:

public string BytesToString(byte[] in)
{

}

public byte[] StringToBytes(string in)
{

}

I know base64 is available but it expands a byte array too much.

Greg Finzer
  • 6,714
  • 21
  • 80
  • 125

4 Answers4

5

Base64 encodes 6 bits per character, producing a string which can be reliably transmitted with very little effort (modulo being careful about URLs).

There is no 7-bit alphabet with the same properties - many, many systems will fail if they're given control characters, for example.

Are you absolutely sure that you're not going to need to go through any such systems (including for storage)? It the extra tiny bit of space saving really enough to justify having to worry about whether something's going to change "\n" to "\r\n" or vice versa, or drop character 0?

(For a storage example, 2100 bytes = 2800 chars in base64, or 2400 chars in base128. Not a massive difference IMO.)

I'd strongly urge you to see whether you can find the extra storage space - it's likely to save a lot of headaches later.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Jon, I am looking for something like this: http://www.koders.com/java/fid45DBB362CAC753027494F4B0C53F36F1A45C3BF0.aspx?s=base64 I am not sure how to convert this to C#. – Greg Finzer Jun 11 '09 at 14:50
  • Okay, will look into porting that code. I'd still use Base64 personally, unless you *really* have to squeeze it as much as possible. – Jon Skeet Jun 11 '09 at 15:28
0

It's a bit difficult to determine from your question (as it currently stands) what you're trying to achieve. Are you trying to do base-128 encoding, or are you trying to conver a series of (presumably hexadecimal) digits representing 7bit numbers into the equivalent binary 8bit numbers?

The encoding I just described is the one used in the ID3v2 tag format for encoding the size field in the header.

If this is what you're trying to achieve, then perhaps something like the code below will do the trick. It's based on the '257' example in the ID3 specification:

[Test]
public void GetInt()
{
    var bytes = new byte[] { 0, 0, 2, 1};

    var result = 0;

    foreach (var b in bytes)
    {
        result <<= 7;
        result = result + (b & 0x7f);
    }

    Assert.That(result, Is.EqualTo(257));
}

[Test]
public void SetInt()
{
    var i = 257;

    var bytes = new Stack<byte>();

    for (var j = 0 ; j < sizeof(int) ; j++)
    {
        var b = (byte)(i & 0x7f);
        bytes.Push(b);
        i >>= 7;
    }

    Assert.That(bytes.Pop(), Is.EqualTo(0));
    Assert.That(bytes.Pop(), Is.EqualTo(0));
    Assert.That(bytes.Pop(), Is.EqualTo(2));
    Assert.That(bytes.Pop(), Is.EqualTo(1));
}
Damian Powell
  • 8,655
  • 7
  • 48
  • 58
  • Argh. Just read *all* the comments here and realised that this is *not* what you're looking for. However, you could get close by taking the byte value, adding 32, and converting to a char (and vice versa). I would also echo Jon Skeet's recommendation of using base-64 though. It's a well-known standard that won't confuse the socks off of those that follow you! – Damian Powell Sep 04 '09 at 17:47
0

Is UTF-7 what you're looking for?

Kip
  • 107,154
  • 87
  • 232
  • 265
  • No, basically I need to convert bytes from a range of 0 to 255 into ASCII and then back again to a byte array. I would like more than the range of A-Z0-9 that base64 provides. The intention is to binary serialize an object, convert the bytes to Base128 then on the other end, convert the string back to bytes. – Greg Finzer Jun 11 '09 at 14:55
  • i don't know the details of what you're doing, but i'm surprised that the low-end ascii control characters (0-32, or 0x00-0x20) are okay for you, but non-ascii (128-255, or 0x80-0xff) are not okay. especially 0, which is frequently interpreted as a string terminator. i'd echo what skeet said above and recommend base-64. it is a very widely used standard and it can represent binary data solely in safe ascii characters, and i'm sure c# already has support built-in – Kip Jun 11 '09 at 20:00
0

Alternatively, there's the ASCIIEncoding class that converts a UTF-8 String to an array of 8-bits bytes, discarding characters that can't be represented in 7-bits ASCII.

Michiel Buddingh
  • 5,783
  • 1
  • 21
  • 32
  • 1
    Unless I've misinterpreted the question, the OP doesn't want to lose data. (It also converts invalid data into "?" rather than discarding it.) – Jon Skeet Jun 10 '09 at 17:33