0

I have a file in which I am writing the content using the below C# code.

ConcurrentDictionary<string, DateTime> _jobsAck;
public void SaveToDisk()
{
    var binaryFormatter = new BinaryFormatter();
    using (var stream = File.Open(BINARY_FILENAME, FileMode.OpenOrCreate))
    {
        binaryFormatter.Serialize(stream, _jobsAck);
    }
}

I am reading this file and deserializing using the below C# code.

public void LoadFromDisk()
{
    if (!File.Exists(BINARY_FILENAME)) return;

    var binaryFormatter = new BinaryFormatter();
    using (var stream = File.Open(BINARY_FILENAME, FileMode.Open, FileAccess.Read))
    {
        var deserializedStream = binaryFormatter.Deserialize(stream);
        _jobsAck = deserializedStream as ConcurrentDictionary<string, DateTime>;
        if (_jobsAck == null)
        {
            _jobsAck = new ConcurrentDictionary<string, DateTime>();
            if (!(deserializedStream is Dictionary<string, DateTime> ackDict)) return;
            foreach (var pair in ackDict)
            {
                _jobsAck.TryAdd(pair.Key, pair.Value);
            }
        }
    }
}

We have been asked not to use the BinaryFormatter because it has some security-related issues. So Is there any alternative way which could read/write in binary format?

.Net framework version: 4.7.2

Vivek Nuna
  • 25,472
  • 25
  • 109
  • 197
  • Use JSON serializer/deserializer. – jdweng Aug 22 '23 at 12:19
  • 1
    https://www.nuget.org/packages/MessagePack/ – Jesús López Aug 22 '23 at 12:20
  • Looks like it depends what data you are handling: https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide#binaryformatter-security-vulnerabilities – Matt Evans Aug 22 '23 at 12:21
  • @jdweng I cannot use json or xml serializer due to some technical constraints in this case. – Vivek Nuna Aug 22 '23 at 12:21
  • 2
    Could you please express your constraints within your question so that we don't have to guess? – Zdeněk Jelínek Aug 22 '23 at 12:28
  • 2
    Alternatives based on what criteria? What are the `technical constraints`? What is stored in that file? There are *several* duplicate questions, beyond the options described in the deprecation docs. Parquet, Arrow, Protobuf, MessagePack, ORC, HDF5 etc are all options that can be read by both .NET code and other applications. – Panagiotis Kanavos Aug 22 '23 at 12:28
  • 1
    You could even use SQLite or even a plain old CSV file. If the data is a `Dictionary` you can store it into a two-column table or text file. Using `BinaryFormatter` was already wasteful, storing more data than necessary – Panagiotis Kanavos Aug 22 '23 at 12:34
  • without knowing the constraints, I suggest use json serializer internally but replace `[` with `<` before outputing the file and call it `kson` :D – Rand Random Aug 22 '23 at 13:04
  • @PanagiotisKanavos the problem here is I need to read the binary file first and then I need to write the file later. so my binary file already has some data and I don't want to lose it. – Vivek Nuna Aug 22 '23 at 13:04
  • 1
    well, the obvious answer would be do not overwrite the original file, or? - but why would you lose the data at all? if you want to replace it, check the format of the file - if its binary format - read it with binary formatter and "transform" it to whatever – Rand Random Aug 22 '23 at 13:08
  • @RandRandom I want to completely get rid of BinaryFormatter due to security issues. – Vivek Nuna Aug 22 '23 at 13:11
  • 2
    but, you apparently can't if you still need to be able to read data formated with binary formatter - IMHO you can't have it both ways, still be downwards compatible with old data written with binary formatter and at the same time get rid of it - you at the very least need to be able to read data, you can get rid of writing data with the binary formatter – Rand Random Aug 22 '23 at 13:12
  • @VivekNuna the security issue is the contents of the file, not the `BinaryFormatter` class itself. That file contains type names and signatures, not just values. It's dangerous because reading from it will try to use whatever type or method matches the stored signature, whether it's the original or not. It will even create any missing types, which could be used instead of the application's own dynamically loaded types. That's why BinaryFormatter is unfixable – Panagiotis Kanavos Aug 22 '23 at 13:25
  • @VivekNuna if that file referred to a class in your target application, deserializing it would create an instance *and* run its constructor, before your code had a chance to check that data. `BinaryFormatter.Deserialize` has no type parameter so it will return whatever is stored in that file. – Panagiotis Kanavos Aug 22 '23 at 13:31
  • @PanagiotisKanavos - do you happen to know if a `Converter.exe` would be possible, that gets executed in a sandbox eg. https://stackoverflow.com/questions/3029214 - does this eliminate the security risks? – Rand Random Aug 22 '23 at 13:36
  • 1
    A minimal console application that only deserialized the data and wrote it to a file would be simpler and just as safe. `Deserialize` would still load whatever was in that file, even if it was eg a 1GB array, generating any missing types. A better option, one taken by .NET itself, is to migrate the data and gradually remove BinaryFormatter entirely. After all, the application is already at risk by using it. A gradual migration won't *increase* the risk. – Panagiotis Kanavos Aug 22 '23 at 13:59

2 Answers2

3

I created a binary serialization alternative, which is fully compatible with the IFormatter infrastructure, including the ISerializable, IDeserializationCallback, IObjectReference interfaces, the (de)serialization event methods, surrogate selectors and binders.

Security note: Please read the security notes at the Remarks section of the BinarySerializationFormatter class. Even if SafeMode is enabled (which eliminates a lot of security issues that BinaryFormatter suffers from), there might be some security concerns if the stream contains non-natively supported types that are resolved by name on deserialization. To address these issues further restrictions will be introduced for SafeMode in the upcoming version. To be completely safe with the current version see the update below.

You can find the NuGet package here and an online demo here, which also demonstrates how compact the result can be compared to BinaryFormatter.


Update for clarifying the security questions:

Does your package solve the security issue which is with the BinaryFormatter?

TL;DR: Yes, if you use only the natively supported types by the serializer and enable SafeMode on deserialization. Additionally, until the next version is released it's also recommended to you use a safe binder to filter the expected types encoded by assembly qualified name (see an example at the bottom of the answer).

Elaborated answer:

BinaryFormatter is dangerous at multiple levels. Whereas some of these threats can be cured by a reimplementation (eg. we can prevent auto-loading assemblies referred by the serialization stream, we can use pessimistic allocation for arrays, strings and collections and increase the capacity dynamically to prevent OutOfMemoryException attacks, etc.), others come from the IFormatter instrastructure itself. For example, serializable types that can have an invalid state in terms of their fields usually do not validate the deserialized data in their deserialization events or in the special constructor. So it is partly the implementer's responsibility to ensure security completely.

The main reason of BinaryFormatter "cannot be made secure" that it's a polymorphic serializer, meaning, an object field can hold any serializable type - just name it in the serialization stream and it will be resolved. It is a common misunderstanding that it's insecure because it uses a binary format, whereas JSON serialization is safe. No, polymorphic JSON serialization (eg. Newtonsoft's JSON.NET) is also insecure if you allow the type names to be dumped and resolved. That's why System.Text.Json does not support polymorphism automatically, and applying some polymorphism to it can be a pain.

And vice versa, Google's ProtoBuf is safe, even though it has a binary format because it only uses a few primitive types that are not resolved by name but from a closed set of identifiers. The most complex thing you can encode is a list of key-value pairs. In return, it's really hard to serialize a nested object graph with ProtoBuf.

BinarySerializationFormatter attempts to minimize the risks by supporting a lot of types (including collections) natively. These types are encoded without any assembly identity so it ensures both safety and a very compact payload if you don't use any custom types. As long as you use only these types and enable SafeMode on deserialization you are completely safe.

If you are using any custom type (even if it's just a simple enum) the assembly qualified name of the type must be stored, which can be manipulated. The SafeMode in the current version only ensures that the type can be resolved from the already loaded assemblies only, so it refuses to load new assemblies during the deserialization. But this can be safe only if the already loaded assemblies cannot be exploited by any security hole. For example, if you target .NET Framework, the TempFileCollection class can be exploited to delete files (this specific attack is now protected by a special handling in SafeMode). Therefore in the upcoming versions in SafeMode you must explicitly declare the types that are expected to be resolved by name. Until then you can use one of the binders to be even more safe. Find an example below.

I have already serialized the data using binaryformatter and written to the file. now I want to use some other alternative which can read this file and deserialize the file content.

The format of the streams are not compatible. So BinarySerializationFormatter can only assure that if you were able to serialize your objects with BinaryFormatter, then it will work also with my serializer but the binary stream will be different (in fact, much more compact).

However, if you really need to use the stream serialized by BinaryFormatter, a small step towards security can be using a serialization binder (eg. this one), which can be used also with BinaryFormatter. It can ensure that only the expected types are allowed to be deserialized, and if its SafeMode is true, then unexpected type names will not be resolved even if the serialization stream contains a manipulated assembly identity with a potentially harmful module initializer or a type with malicious constructor, etc.

György Kőszeg
  • 17,093
  • 6
  • 37
  • 65
  • Thank you for your answer. But the problem remains the same, Please refer to the comments in the question. let me summarize for you. I have already serialized the data using binaryformatter and written to the file. now I want to use some other alternative which can read this file and deserialize the file content. Note: The file was written using BinaryFormatter. – Vivek Nuna Aug 22 '23 at 16:12
  • one more question, Does your package solve the security issue which is with the BinaryFormatter? – Vivek Nuna Aug 22 '23 at 16:13
  • Also I tried to deserialise my file using your library , it gave exception.you could reproduce it as well. Just create a dictionary, then serialise it to a file using binary formatter then deserialise using your library – Vivek Nuna Aug 22 '23 at 17:00
  • @VivekNuna: _"I tried to deserialise my file using your library"_ - I've just updated my answer. Of course, that will not work. And if it did, it would just reintroduce the security issues of `BinaryFormatter`. As it supports only a very few types natively, it goes for recursive serialization even for a `decimal`, `DateTime` or `List`. Meaning, it stores the assembly qualified identity for them that can be manipulated. See the details in the updated answer. – György Kőszeg Aug 22 '23 at 17:19
1

If your customers already have data saved with binaryFormatter you need to keep it for reading files, regardless of its security issues, until you have migrated all, or most, of your customers to some new format. There is to my knowledge no publish specification of the format for BinaryFormatter, nor any other compatible libraries. And even if there where, I'm not sure it could solve the security problems, since the problems are inherent to the format itself.

So the first step should be to create a new format, using some well designed serialization library. I mostly use json and protobuf (.net), but there are plenty of good alternatives, see https://softwarerecs.stackexchange.com/ if you want recommendations. Just about anything should be better than BinaryFormatter.

You should then update your application so that it can no longer save files using binaryFormtter, only in your new format. Depending on your exact use case you might be able to convert saved data as soon as the new version is installed, in other cases you might only be able to do so when a user explicitly saves a file.

Once your updated application with support for the new format has been out for a while you can start thinking about removing support for BinaryFormatter. Users of older versions might be forced to update to an intermediate version and convert their files. Or you might publish a separate tool that only does conversions between the old format and the new format. You could also add a security warning when opening a file in the old format, to at least warn the user of the risk.

The main point here is that the sooner you introduce a new format, the sooner you can drop support for the old format. The length of this process will largely depend on your support commitments to customers, and willingness to make breaking changes.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • I am getting runtime error when using protobuf Could not load file or assembly 'System.Memory, Version=4.0.1.2, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) – Vivek Nuna Aug 22 '23 at 14:05
  • @VivekNuna That is likely because you are missing a reference to "'System.Memory, Version=4.0.1.2" or one of its dependencies. Nuget sometimes has problems installing transitive dependencies for projects in the old format, so you might need to install such dependencies yourself. – JonasH Aug 22 '23 at 14:19
  • does that version even exist? https://www.nuget.org/packages/System.Memory#versions-body-tab – Rand Random Aug 22 '23 at 14:21
  • @RandRandom It at least existed at one point in time. But as long as the major version is the same, never versions should be backward compatible. So using the latest 4.x should work. – JonasH Aug 22 '23 at 14:27
  • @JonasH that is why I am wondering. I already have 4.5.5 version of System.Memory. I have raised bug here as well https://github.com/protobuf-net/protobuf-net/issues/1092 . lets see – Vivek Nuna Aug 22 '23 at 14:34
  • @VivekNuna I have no real idea. Normally, assembly bindings in the app.config file should instruct the CLR to load the newer version if an older version is requested. But there are plenty of ways for dependencies to go wrong. – JonasH Aug 22 '23 at 14:46