2

I've been searching for 2 hours or better for a way to use the Read7BitEncodedInt method for this. I need to use it somehow to reduce my file size (in this case likely by 100mb or more). I was also looking at using the ReadString method since it seems to do roughly the same thing. But that seems less appropriate and I'm not really sure that it would work. If there is some other alternative to this that I'm unaware of I'd be open to using that too.

In summation. How would I implement the Read7BitEncodedInt method into the following code ? Also, I'm not too certain that my method to Write7BitEncodedInt is correct either.

    public void SaveFile()
    {
        using (FileStream stream = new FileStream("C:\\A_random.txt", FileMode.Create))
        {
            using (BinaryWriter writer = new BinaryWriter(stream))
            {
                for (int i = 0; i < typeCount.Count; i++)
                {
                    writer.Write((byte)typeCount[i]);
                    writer.Write(type[i]);
                }
                writer.Close();
            }
        }
        LoadFile();
    }

    public void LoadFile()
    {
        using (FileStream stream = new FileStream("C:\\A_random.txt", FileMode.Open))
        {
            using (BinaryReader reader = new BinaryReader(stream))
            {
                int i = 0;
                while (stream.Position != stream.Length)
                {
                    int count = reader.Read7BitEncodedInt();
                    byte val = reader.ReadByte();
                    for (int ii = 0; ii < count; ii++)
                    {
                        grid[i].val1 = i;
                        grid[i].val2 = val;
                        grid[i].val3 = vect;
                        i++;
                    }
                }
                reader.Close();
            }
        }
    }
mafu
  • 31,798
  • 42
  • 154
  • 247
Grimbly
  • 194
  • 8
  • If I understand correctly, you want to use the 7bitencodedint method to reduce your file size. So you need to update your SaveFile method to utilize the write7bitencodedint method as well as make sure your loadfile method can read it back in? – Brian Dishaw Jun 12 '11 at 00:02
  • @Brian Dishaw Ya that pretty much sums it up. – Grimbly Jun 12 '11 at 00:06
  • I found an article that explains why they are protected methods and what you can do to implement them in your own code. I'm not sure how out of date this is. What I would recomend is loading up the assembly in something like .net reflector and taking a look at the implementation of these methods. http://www.dotnet247.com/247reference/msgs/53/268025.aspx – Brian Dishaw Jun 12 '11 at 00:13
  • `I'm not too certain that my method to Write7BitEncodedInt is correct either.` It is not. You didn't use it. Trying to save 1 bit out of 8 is pointless, just in case that's what you are doing. Use a ZIP library. Or just stop worrying when you can buy a terabyte for less than a hundred bucks. – Hans Passant Jun 12 '11 at 00:22
  • 2
    Can you explain what you believe this does to decrease file size? Because my understanding of the 7 bit encoding is that it increases file size by at least 14%. The purpose of the 7 bit encoding is to my understanding for compatibility with extremely old data transfer systems that expect the top bit of every byte to be off. It *decompresses* integers, it does not *compress* them. Why are you not using a standard compression library to compress your file, if that's what you need? – Eric Lippert Jun 12 '11 at 13:40
  • 3
    @Eric, the encoding that method uses is basically the same as [Protocol Buffers' varints](http://code.google.com/apis/protocolbuffers/docs/encoding.html#varints). It decreases size of small integers, while large integers can take up to 5 bytes. – svick Jun 12 '11 at 15:18
  • @svick: ah, that makes sense. .NET often compresses integers similarly. Neat! However, in this case I'd still think it would be easier to compress the entire stream. – Eric Lippert Jun 12 '11 at 22:37
  • @Eric I'm looking to avoid compressing the whole stream for this because the application will be accessing this file numerous times throughout its use and only reading in portions at a time I wanted to avoid the extra overhead of unzipping and rezipping each time. Also my understanding of zipping is that seeking isn't possible or is very difficult. If all that is wrong I'd be more than happy to look into it more though :) – Grimbly Jun 12 '11 at 23:53

2 Answers2

8

Here is a way to do it:

public class MyBinaryReader : BinaryReader {
    public MyBinaryReader(Stream stream) : base(stream) {}
    public new int Read7BitEncodedInt() {
        return base.Read7BitEncodedInt();
    }
}

public class MyBinaryWriter : BinaryWriter {
    public MyBinaryWriter(Stream stream) : base(stream) {}
    public new void Write7BitEncodedInt(int i) {
        base.Write7BitEncodedInt(i);
    }
}

And some test code:

void Main() {
var stream = new MemoryStream();

var writer = new MyBinaryWriter(stream);    

writer.Write7BitEncodedInt(100);
writer.Write7BitEncodedInt(1000);
writer.Write7BitEncodedInt(10000);
writer.Write7BitEncodedInt(100000);
writer.Write7BitEncodedInt(1000000);
writer.Write7BitEncodedInt(-1000000);

stream.Position = 0;

var reader = new MyBinaryReader(stream);    

Debug.Assert(reader.Read7BitEncodedInt() == 100);
Debug.Assert(reader.Read7BitEncodedInt() == 1000);
Debug.Assert(reader.Read7BitEncodedInt() == 10000);
Debug.Assert(reader.Read7BitEncodedInt() == 100000);
Debug.Assert(reader.Read7BitEncodedInt() == 1000000);
Debug.Assert(reader.Read7BitEncodedInt() == -1000000);
}
adrianm
  • 14,468
  • 5
  • 55
  • 102
  • Very cool. It seems to work really well definitely much cleaner than what i was doing. Thanks :) – Grimbly Jun 15 '11 at 01:09
-3

This is a case where you have a solution to a problem in mind before you ask the question, even if the solution is a bad idea. There is a blog entry on either Raymond Chen's blog, Eric Lippert's blog or probably both about this subject, but I couldn't find it (them).

In the last 20 years, needing to care about 7-bit characters has been mostly phased out. And, the notion of trying to save disk space by using them is patently ridiculous.

The only way this would even work would be if you overlapped characters so that you effectively combined 8 characters into 7 bytes. I can almost guarantee without looking that that is not what Read7BitEncodedInt does.

(Actually, I have no idea what it does, but fortunately, I don't care)

If you need to work with large files, use compression of some kind. Zip, GZip, 7-Zip, whatever.

EDIT: Since you are not actually asking about strings, then this may be of use. YMMV, however. Look for the ReadVLI and WriteVLI functions.

http://gilgamesh.hamsterrepublic.com/websvn/filedetails.php?repname=ohrrpgce&path=%2Fwip%2Freload.bas&rev=3568&peg=4164

Mike Caron
  • 14,351
  • 4
  • 49
  • 77
  • 10
    It is a very bad practice to assume that because HD space is cheap these days you'd have the right to just consume whatever you wanted on someone else's machine. By doing this manual compression I've reduced my file size from 380MB to 35MB on average. Obviously not an obsolete practice after all. Both yours and Hans Passant's comments border on being spam. Had it not been for the last minute comments about Zipping they would be. If your not going to be constructive just don't comment. Since you didn't seem to know this, the method in question converts an Int into 1-4 bytes as needed. – Grimbly Jun 12 '11 at 02:18
  • @Grimbly, that would not be possible. It converts it into one to **five** bytes. – svick Jun 12 '11 at 15:14
  • @Grimbly, ah, see, I was thrown off by the fact that you are soring **binary** data into a file that ends in `.txt`. I assumed you were talking about strings. I have written code myself to do this **variable-length encoding**, the link to which I will add to my answer. – Mike Caron Jun 12 '11 at 19:43
  • Ya the .txt is just any easy placeholder for now. it will likely end up being a custom extension in the end. The Ints I'm trying to compress are typically single digits in my testing so they compress down to single bytes. the end result will likely be ones that mostly reduce down to 2 bytes and with about 50 million of them this seems worth doing. The Int in question represents the iterations for the byte that follows if anyone was curious. I've actually got this working now but didn't want to answer my own question and I'm hopeful your answers will perform better. Thanks for the links too. – Grimbly Jun 12 '11 at 23:46