0

i have a large binary file that contains different data types, i can access single records in the file but i am not sure how to loop over the binary values and load it in the memory stream byte by byte

i have been using binary reader

BinaryReader binReader = new BinaryReader(File.Open(fileName, FileMode.Open));
            Encoding ascii = Encoding.ASCII;
            string authorName = binReader.ReadString();
            Console.WriteLine(authorName);
            Console.ReadLine();

but this won't work since i have a large file with different data types simply, i need to convert the file to read byte by byte and then read these data either if it's a string or whatsoever.

would appreciate any thought that can help

  • 1
    What do you mean by 'loop over the binary values and load it in the memory stream byte by byte'? There is no memorystream in your code. – the.Doc May 29 '20 at 14:41
  • @the.Doc if i know how to load it in the memory stream i wouldn't be asking the question buddy! –  May 29 '20 at 14:43
  • You've not told us why you want to load it into a memory stream? – the.Doc May 29 '20 at 14:44

2 Answers2

0

Here's a simple bit of code that shows the most basic way of doing it.

using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace binary_read
{
    class Program
    {
        private static readonly int bufferSize = 1024;

        static async Task Main(string[] args)
        {
            var bytesRead = 0;
            var totalBytes = 0;

            using (var stream = File.OpenRead(args.First()))
            {
                do
                {
                    var buffer = new byte[bufferSize];
                    bytesRead = await stream.ReadAsync(buffer, 0, bufferSize);
                    totalBytes += bytesRead;

                    // Process buffer

                } while (bytesRead > 0);

                Console.WriteLine($"Processed {totalBytes} bytes.");
            }
        }
    }
}

The main bit to take note of is within the using block.

Firstly, when working with files/streams/sockets it's best to use using if possible to deterministically clean up after yourself.

Then it's really just a matter of calling Read/ReadAsync on the stream if you're just after the raw data. However there are various 'readers' that provide an abstraction to make working with certain formats easier.

So if you know that you're going to be reading ints and doubles and strings, then you can use the BinaryReader and it's ReadIntxx/ReadDouble/ReadString methods.

If you're reading into a struct, then you can read the properties in a loop as suggested by @JonasH above. Or use the method in this answer.

teambanana
  • 208
  • 1
  • 8
  • that function is exactly what i am looking for but then how to read the variables data from the byte array ? –  Jun 02 '20 at 09:24
  • Can you update the question with this? If you add the struct to the sample it would be really useful as well. – teambanana Jun 02 '20 at 12:57
0

This will very much depend on what format the file is in. Each byte in the file might represent different things, or it might just represent values from a large array, or some mix of the two.

You need to know what the format looks like to be able to read it, since binary files are not self-descriptive. Reading a simple object might look like

var authorName = binReader.ReadString();
var publishDate = DateTime.FromBinary(binReader.ReadInt64());
... 

If you have a list of items it is common to use a length prefix. Something like

var numItems = binReader.ReadInt32();
for(int i = 0; i < numItems; i++){
    var title = binReader.ReadString();
    ...
}

You would then typically create one or more objects from the data that can be used in the rest of the application. I.e.

new Bibliography(authorName, publishDate , books);

If this is a format you do not control I hope you have a detailed specification. Otherwise this is kind of a lost cause for anything but the cludgiest solutions.

If there is more data than can fit in memory you need some kind of streaming mechanism. I.e. read one item, do some processing of the item, save the result, read the next item, etc.

If you do control the format I would suggest alternatives that are easier to manage. I have used protobuf.Net, and I find it quite easy to use, but there are other alternatives. The common way to use these kinds of libraries is to create a class for the data, and add attributes for the fields that should be stored. The library can manage serialization/deserialization automatically, and usually handle things like inheritance and changes to the format in an easy way.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • the thing is i need to import the data into a struct –  Jun 01 '20 at 12:45
  • Then read out each primitive and create the struct when you have all the data for it. If you only have primitive data in the struct you might be able to read out a buffer and [convert the buffer directly you your struct](https://stackoverflow.com/questions/2871/reading-a-c-c-data-structure-in-c-sharp-from-a-byte-array). This might be a bit faster, but is not applicable if the type contains strings or arrays. – JonasH Jun 01 '20 at 15:23
  • can you explain it a bit better please? –  Jun 01 '20 at 15:56
  • If you have a struct composed of X, Y, and Z. Then do ReadSingle() three times and put the result into temporary variables x, y, and z, and then create the struct like new MyStruct(x, y, z). Not sure how to explain it any simpler. – JonasH Jun 01 '20 at 17:57
  • and what if i need to read 100 , shall i repeat that? no i don't think so –  Jun 02 '20 at 05:49
  • 1
    If you have 100 items, then read the items in a loop that runs 100 times. I'm not sure what the problem is here. – JonasH Jun 02 '20 at 07:35