Reading\Writing Structured Binary File

Question

i want to read\write a binary file which has the following structure:

enter image description here

The file is composed by "RECORDS". Each "RECORD" has the following structure: I will use the first record as example

(red)START byte: 0x5A (always 1 byte, fixed value 0x5A)
(green) LENGTH bytes: 0x00 0x16 (always 2 bytes, value can change from "0x00 0x02" to "0xFF 0xFF")
(blue) CONTENT: Number of Bytes indicated by the decimal value of LENGTH Field minus 2. In this case LENGHT field value is 22 (0x00 0x16 converted to decimal), therefore the CONTENT will contain 20 (22 - 2) bytes.

My goal is to read each record one by one, and write it to an output file. Actually i have a read function and write function (some pseudocode):

private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
    byte START = 0x5A;
    int decimalLenght = 0;
    byte[] content = null;
    byte[] length = new byte[2];

    while (binaryReader.PeekChar() != -1)
    {
        //Check the first byte which should be equals to 0x5A
        if (binaryReader.ReadByte() != START)
        {
            throw new Exception("0x5A Expected");
        }

        //Extract the length field value
        length = binaryReader.ReadBytes(2);

        //Convert the length field to decimal
        int decimalLenght = GetLength(length);

        //Extract the content field value
        content = binaryReader.ReadBytes(decimalLenght - 2);

        //DO WORK
        //modifying the content

        //Writing the record
        Write(binaryWriter, content, length, START);
    }
}

private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
    binaryWriter.Write(START);
    binaryWriter.Write(length);
    binaryWriter.Write(content);   
}

This way is actually working. However since I am dealing with very large files i find it to be not performing at all, cause I Read and write 3 times foreach Record. Actually I would like to read bug chunks of data instead small amount of byte and maybe work in memory, but my experience in using Stream stops with BinaryReader and BinaryWriter. Thanks in advance.

score 2 · Accepted Answer · answered Nov 04 '11 at 17:43

2

FileStream is already buffered, so I'd expect it to work pretty well. You could always create a BufferedStream around the original stream to add extra more buffering if you really need to, but I doubt it would make a significant difference.

You say it's "not performing at all" - how fast is it working? How sure are you that the IO is where your time is going? Have you performed any profiling of the code?

answered Nov 04 '11 at 17:43

Jon Skeet

1,421,763
867
9,128
9,194

FileStream has a constructor parameter that controls buffer size. Another thing to consider is that the operating system itself has buffering as well. – James Johnston Nov 04 '11 at 17:47
@JamesJohnston: Indeed. I'd expect the defaults to be fine for most scenarios. I doubt that this is really the problem. – Jon Skeet Nov 04 '11 at 17:50
I agree. I did run into one exception though, albeit not with C#. The C runtime I was using defaulted to a piddling 512 byte buffer. So disk performance would be extremely slow when reading the data file 4 bytes at a time (which our program did). Of course, this isn't usually an issue because the operating system buffers reads even if the program does not do an adequate job. – James Johnston Nov 04 '11 at 18:25
Network drives present a special situation however. Read-ahead can be dangerous: another client can write, invalidating the client's buffer. Windows uses opportunistic locking (oplocks) to still safely do read-ahead buffering over a network. To make a long story short, some legacy file-based systems (Paradox/MS Access/QuickBooks/etc.) can experience severe file corruption if a network connection is lost. These people might disable oplocks to preserve database integrity. When that happened, our read performance went down the tubes because only the C runtime 512 byte buffer was used. – James Johnston Nov 04 '11 at 18:27
However, if the OP is not working over a network drive with oplocks disabled, then the problem is most likely elsewhere and the OP probably needs to profile the code more. – James Johnston Nov 04 '11 at 18:28

score 1 · Answer 2 · answered Dec 08 '12 at 20:11

I might also suggest that you read 3 (or 6?) bytes initially, instead of 2 separate reads. Put the initial bytes in a small array, check the 5a ck-byte, then the 2 byte length indicator, then the 3 byte AFP op-code, THEN, read the remainder of the AFP record.

It's a small difference, but it gets rid of one of your read calls.

I'm no Jon Skeet, but I did work at one of the biggest print & mail shops in the country for quite a while, and we did mostly AFP output :-)

(usually in C, though)

Reading\Writing Structured Binary File

2 Answers2