3

I'm new to programming in general (My understanding of programming concepts is still growing.). So this question is about learning, so please provide enough info for me to learn but not so much that I can't, thank you. (I would also like input on how to make the code reusable with in the project.)

The goal of the project I'm working on consists of:

  1. Read binary file.

  2. I have known offsets I need to read to find a particular chunk of data from within this file.

  3. First offset is first 4 bytes(Offset for end of my chunk).

  4. Second offset is 16 bytes from end of file. I read for 4 bytes.(Gives size of chunk in hex).

  5. Third offset is the 4 bytes following previous, read for 4 bytes(Offset for start of chunk in hex).

  6. Locate parts in the chunk to modify by searching ASCII text as well as offsets.

Now I have the start offset, end offset and size of my chunk. This should allow me to read bytes from file into a byte array and know the size of the array ahead of time.

(Questions: 1. Is knowing the size important? Other than verification. 2. Is reading part of a file into a byte array in order to change bytes and overwrite that part of the file the best method?)

So far I have managed to read the offsets from the file using BinaryReader on a MemoryStream. I then locate the chunk of data I need and read that into a byte array.

I'm stuck in several ways:

  • What are the best practices for binary Reading / Writing?
  • What's the best storage convention for the data that is read?
  • When I need to modify bytes how do I go about that.
  • Should I be using FileStream?
svick
  • 236,525
  • 50
  • 385
  • 514

1 Answers1

3

Since you want to both read and write, it makes sense to use the FileStream class directly (using FileMode.Open and FileAccess.ReadWrite). See FileStream on MSDN for a good overall example.

  1. You do need to know the number of bytes that you are going to be reading from the stream. See the FileStream.Read documentation.
  2. Fundamentally, you have to read the bytes into memory at some point if you're going to use and later modify their contents. So you will have to make an in-memory copy (using the Read method is the right way to go if you're reading a variable-length chunk at a time).

As for best practices, always dispose your streams when you're done; e.g.:

using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
    //Do work with the FileStream here.
}

If you're going to do a large amount of work, you should be doing the work asynchronously. (Let us know if that's the case.)

And, of course, check the FileStream.Read documentation and also the FileStream.Write documentation before using those methods.

Reading bytes is best done by pre-allocating an in-memory array of bytes with the length that you're going to read, then reading those bytes. The following will read the chunk of bytes that you're interested in, let you do work on it, and then replace the original contents (assuming the length of the chunk hasn't changed):

EDIT: I've added a helper method to do work on the chunk, per the comments on variable scope.

using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
    var chunk = new byte[numOfBytesInChunk];
    var offsetOfChunkInFile = stream.Position; // It sounds like you've already calculated this.
    stream.Read(chunk, 0, numOfBytesInChunk);

    DoWorkOnChunk(ref chunk);        

    stream.Seek(offsetOfChunkInFile, SeekOrigin.Begin);
    stream.Write(chunk, 0, numOfBytesInChunk);
}

private void DoWorkOnChunk(ref byte[] chunk)
{
    //TODO: Any mutation done here to the data in 'chunk' will be written out to the stream.
}
Lars Kemmann
  • 5,509
  • 3
  • 35
  • 67
  • Thank you for your reply. At most I would guess the max size of the chunk to modify would be 2KB. I do not see it being larger than that. Do you think I would still need to do this operation asynchronously? I would also like to be able to do work on data in the chunk at an unknown offset, the way to find this data is to locate ASCII words within that chunk. Would I loop through a search and replace operation? How would this affect the memory data within chunk variable? (Scope is a bit confusing to me currently, still reading on it.) – Drew Birmingham Jun 07 '12 at 23:34
  • You only need asynchrony if your application performance is too low (either the UI is being blocked unacceptably, or you can't afford an uneven load distribution on your machine - say, because you're processing a large volume of data, in the megabytes-per-minute maybe). A single 2KB operation is completely trivial - I guarantee you my browser is using far more than that to render this comment box. :) – Lars Kemmann Jun 08 '12 at 04:49
  • As for locating the offset within the chunk, you can start by doing a nested loop - O(nm) performance, n being string length and m being the search pattern length. There are many ways to optimize this - check http://en.wikipedia.org/wiki/String_searching_algorithm – Lars Kemmann Jun 08 '12 at 04:50
  • Finally, as for how you'd affect the memory data in the chunk: once you've called FileStream.Read(), you have a *mutable* copy available to you in the array. You can do whatever you want to that in-memory data, and then overwrite the source file at the end as I showed in the second code snippet. In terms of scope, note that the variable 'chunk' is only in scope within the using { } block, so you can't modify it once that block is exited. (Makes sense, since then you've closed the file.) You can call a helper function in the //TODO: line. Pass 'chunk' to it as a "ref byte[]". – Lars Kemmann Jun 08 '12 at 04:53
  • The best approach to doing the search is almost certainly to use a regular expression (.NET BCL provides the Regex class for this). Let me see if I can find an example of using Regex with a byte array... – Lars Kemmann Jun 08 '12 at 05:01
  • Check out Darin Dimitrov's answer for how to convert a byte[] to a string for searching. You'd still have to do the modification to the byte array, but this will get you the offsets you need to do that. (Just use Encoding.ASCII instead of Encoding.UTF8.) http://stackoverflow.com/questions/8907911/are-there-any-well-know-regex-libraries-for-net-specifically-for-byte-arrays – Lars Kemmann Jun 08 '12 at 05:05