What is the best way to read and then update record in a binary file with c#

Question

I'm trying to edit some records in a binary file, but I just can't seem to get the hang of it.

I can read the file, but than I can't find the position where I want the record to edit, so I can replace.

This is my code so far:

public MyModel Put(MyModel exMyModel)
{
        List<MyModel> list = new List<MyModel>();

        try
        {
            IFormatter formatter = new BinaryFormatter();

            using (Stream stream = new FileStream(_exMyModel, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
                while (stream.Position < stream.Length)
                {
                    var obj = (MyModel)formatter.Deserialize(stream);
                    list.Add(obj);
                }
            
                MyModel mymodel = list.FirstOrDefault(i => i.ID == exMyModel.ID);
                mymodel.FirstName = exMyModel.FirstName;
                mymodel.PhoneNumber = exMyModel.PhoneNumber;
                
                // Now I want to update the current record with this new object
                // ... code to update
            }

            return phoneBookEntry;
        }
        catch (Exception ex)
        {
            Console.WriteLine("The error is " + ex.Message);
            return null;
        }
}

I'm really stuck here guys. Any help would be appreciated.

I already checked these answers:

Thank you in advance :)

You can't replace string value on serialized stream without at least rewriting that record and *all* record afterwards, since the length might have changed. Why not just use DB? — Martheen, Mar 04 '21 at 10:34
@Martheen It's a coding task, I have to use these binary files in this case. Can you please show me an example on how I can achieve such a result, you're saying? — Idev Dev, Mar 04 '21 at 10:47
Oh... is this homework? (I'll still answer, but homework answers are teachable moments :P) — Corey, Mar 04 '21 at 10:51
@Corey No, It's a project I'm working on for another company, and they required I do this simple project using binary-based files, to store, read and update some data. Can you please provide me with a little help? — Idev Dev, Mar 04 '21 at 10:54
I've made a four layer application for this little project, just to impress them, and now I'm stuck in this "little" problem. I only have 2 hours to deliver :P But I just didn't get much the chance to work with binary files before — Idev Dev, Mar 04 '21 at 10:57
This is **really really important**: do not use `BinaryFormatter`. Pretty much ever. It is incredibly dangerous, and it *will* hurt you - the only question is "when". It also isn't supported in current .NET versions (meaning: .NET 5, .NET Core, etc) — Marc Gravell, Mar 04 '21 at 11:02
"I only have 2 hours to deliver" - what you're trying to do requires a fundamental rethink of *what* you're doing - it seems like more than a 2 hour problem to me (and I'm "somewhat familiar" with binary serialization) — Marc Gravell, Mar 04 '21 at 11:05
@IdevDev `binary-based files` is meaningless. Even a text file is binary. Without knowing the *format* of a file you can't read or write anything. Some binary formats have no records, eg an image. Some binary file formats require reading everything in memory and saving the entire file again. Others allow you to find specific records and modify them, typically by using fixed-length records. — Panagiotis Kanavos, Mar 04 '21 at 11:07
BSON is the binary equivalent of JSON, which makes serialization relatively easy. Protocol Buffers is another similar format. HDF5 is a binary format for large data sets that allows modifications. Similar big-data formats are Avro, Orc and Parquet. With big data, you can't rewrite the file to modify a record, or even read the entire file to find a record — Panagiotis Kanavos, Mar 04 '21 at 11:09
O so I will write down the requirement. Maybe I was wrong on understanding it. The requirement is : `The requirement is a binary file because they are better than text-based files in terms of speed and accuracy.` So I assumed I just use .bin file for this project. I didn't give much thought on it... I'm I wrong? — Idev Dev, Mar 04 '21 at 11:14
"better" is subjective and contextual; *just about anything* would be better than using `BinaryFormatter`, however; a range of well supported serialization frameworks exist using a range of formats - both text and binary; but editing the middle of a file without rewriting everything that comes afterwards: requires serious design — Marc Gravell, Mar 04 '21 at 11:16

score 2 · Answer 1 · answered Mar 04 '21 at 11:01

I would recommend just writing all objects back to the stream. You could perhaps just write the changed object and each after it, but I would not bother.

Start by resetting the stream: stream.Position = 0. You can then write a loop an serialize each object using formatter.Serialize(stream, object)

If this is a coding task I guess you have no choice in the matter. But you should know that BinaryFormatter has various problems. It more or less saves the objects the same way they are stored in memory. This is inefficient, insecure, and changes to the classes may prevent you from deserializing stored objects. The most common serialization method today is json, but there are also binary alternatives like protobuf.net.

Oh Jonas, if only you came earlier in my life :P... The protobuf.net seems like a nice alternative but I just don't have the time to dig into it. I think I will try to go with your first alternative. I will let you know on how it goes — Idev Dev, Mar 04 '21 at 11:41

score 2 · Answer 2 · answered Mar 04 '21 at 11:03

How you update the file is going to rely pretty heavily on whether or not your records serialize as fixed length.

Variable-Length Records

Since you're using strings in the record then any change in string length (as serialized bytes) or anything other change that affects the length of the serialized object will make it impossible to do an in-place update of the record.

With that in mind you're going to have to do some extra work.

First, test the objects inside the read loop. Capture current position before you deserialize each object, test the object for equivalence, save the offset when you find the record you're looking for then deserialize the rest of the objects in the stream... or copy the rest of the stream to a MemoryStream instance for later.

Next, set stream.Position and stream.Length equal to the start position of the record you're updating, truncating the file. Serialize the new copy of the record into the stream, then copy the MemoryStream that holds the rest of the records back into the stream... or capture and serialize the rest of the objects.

In other words (untested but showing the general structure):

public MyModel Put(MyModel exMyModel)
{
    try
    {
        IFormatter formatter = new BinaryFormatter();
        using (Stream stream = File.Open(_exMyModel))
        using (var buffer = new MemoryStream())
        {
            long location = -1;
            while (stream.Position < stream.Length)
            {
                var position = stream.Position;
                var obj = (MyModel)formatter.Deserialize(stream);
                if (obj.ID == exMyModel.ID)
                {
                    location = position;
                    stream.CopyTo(buffer);
                    buffer.Position = 0;
                    stream.Position = stream.Length = position;
                }
            }
            formatter.Serialize(stream);
            if (location > 0 && buffer.Length > 0)
            {
                buffer.CopyTo(stream);
            }
        }
        return phoneBookEntry;
    }
    catch (Exception ex)
    {
        Console.WriteLine("The error is " + ex.Message);
        return null;
    }
}

Note that in general a MemoryStream holding the serialized data will be faster and take less memory than deserializing the records and then serializing them again.

Static-Length Records

This is unlikely, but in the case that your record type is annotated in such a way that it always serializes to the same number of bytes then you can skip everything to do with the MemoryStream and truncating the binary file. In this case just read records until you find the right one, rewind the stream to that position (after the read) and write a new copy of the record.

You'll have to examine the classes yourself to see what sort of serialization modifier attributes are on the string properties, and I'd suggest testing this extensively with different string values to ensure that you're actually getting the same data length for all of them. Adding or removing a single byte will screw up the remainder of the records in the file.

Edge Case - Same Length Strings

Since replacing a record with data that's the same length only requires an overwrite, not a rewrite of the file, you might get some use out of testing the record length before grabbing the rest of the file. If you get lucky and the modified record is the same length then just seek back to the right position and write the data in-place. That way if you have a file with a ton of records in it you'll get a much faster update whenever the length is the same.

Changing Format...

You said that this is a coding task so you probably can't take this option, but if you can alter the storage format... let's just say that BinaryFormatter is definitely not your friend. There are much better ways to do it if you have the option. SQLite is my binary format of choice :)

Actually, since this appears to be a coding test you might want to make a point of that. Write the code they asked for, then if you have time write a better format that doesn't rely on BinaryFormatter, or throw SQLite at the problem. Using an ORM like LinqToDB makes SQLite trivial. Explain to them that the file format they're using is inherently unstable and should be replaced with something that is both stable, supported and efficient.

Hello Corey, thank for the answer. I took it, made a few adjustments, and tested it, but the code creates a new record, it wont update the current one. One of the other requirements was not to use SQLite, because I thought of that. Do you think it's a good idea, just in this case, to update the record after I load all the file, update it via linq and than rewrite all the file again with the changes? — Idev Dev, Mar 04 '21 at 11:31
@IdevDev This performs an update-or-append operation. If it always appends a new record then the ID of the supplied record is not found in the file. — Corey, Mar 04 '21 at 11:44
it finds the id, and all the same continues to append another record with the same id on the file, with the changed data. — Idev Dev, Mar 04 '21 at 11:48
How did you resolve the error with `stream.Position = stream.Length = position;`? Should be `stream.SetLength(stream.Position = position);` or split that into two lines. I just tested the thing in LINQPad and it works except for that. Oh, and with a really simple record type. — Corey, Mar 04 '21 at 11:58
Hello Corey, I had split it in two parts. I managed to make it work. Thank you. I will test a few more times and than accept you answer. — Idev Dev, Mar 04 '21 at 12:13

What is the best way to read and then update record in a binary file with c#

2 Answers2

Variable-Length Records

Static-Length Records

Edge Case - Same Length Strings

Changing Format...