134

I have a collection of objects that I need to write to a binary file.

I need the bytes in the file to be compact, so I can't use BinaryFormatter. BinaryFormatter throws in all sorts of info for deserialization needs.

If I try

byte[] myBytes = (byte[]) myObject 

I get a runtime exception.

I need this to be fast so I'd rather not be copying arrays of bytes around. I'd just like the cast byte[] myBytes = (byte[]) myObject to work!

OK just to be clear, I cannot have any metadata in the output file. Just the object bytes. Packed object-to-object. Based on answers received, it looks like I'll be writing low-level Buffer.BlockCopy code. Perhaps using unsafe code.

Luke Willis
  • 8,429
  • 4
  • 46
  • 79
chuckhlogan
  • 1,524
  • 2
  • 11
  • 9

16 Answers16

224

To convert an object to a byte array:

// Convert an object to a byte array
public static byte[] ObjectToByteArray(Object obj)
{
    BinaryFormatter bf = new BinaryFormatter();
    using (var ms = new MemoryStream())
    {
        bf.Serialize(ms, obj);
        return ms.ToArray();
    }
}

You just need copy this function to your code and send to it the object that you need to convert to a byte array. If you need convert the byte array to an object again you can use the function below:

// Convert a byte array to an Object
public static Object ByteArrayToObject(byte[] arrBytes)
{
    using (var memStream = new MemoryStream())
    {
        var binForm = new BinaryFormatter();
        memStream.Write(arrBytes, 0, arrBytes.Length);
        memStream.Seek(0, SeekOrigin.Begin);
        var obj = binForm.Deserialize(memStream);
        return obj;
    }
}

You can use these functions with custom classes. You just need add the [Serializable] attribute in your class to enable serialization

d219
  • 2,707
  • 5
  • 31
  • 36
Crystalonics
  • 2,265
  • 1
  • 13
  • 3
  • 12
    I tried this and it added all sorts of metadata. The OP said he did not want metadata. – user316117 Jan 23 '13 at 19:47
  • 5
    Not to mention, everyone seems to assume that what you're trying to serialize is something that you have written, or has already been pre set up to be serialized. – Hexum064 Nov 20 '15 at 17:23
  • 3
    You can pass the byte array directly to the constuctor of `MemoryStream` in the second code example. This would eliminate the use of `Write(...)` and `Seek(...)`. – unknown6656 Feb 04 '17 at 13:08
  • 14
    Use of binary formatter is now considered unsafe. https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide#preferred-alternatives – uniquelau Jan 13 '21 at 15:57
  • I am working with a pdf object that the third party vendor is marked as Sealed. So I can't mark the class as [Serializable] – dotnetdev_2009 Jun 21 '22 at 23:31
  • This method is giving me out of memory exception because object size is big. How to handle it? – Anil Aug 26 '22 at 23:37
  • this does not answer the question at all! OP clearly stated that he wants the Raw underlaying Bytes.... – Tomer W Sep 09 '22 at 18:41
52

If you want the serialized data to be really compact, you can write serialization methods yourself. That way you will have a minimum of overhead.

Example:

public class MyClass {

   public int Id { get; set; }
   public string Name { get; set; }

   public byte[] Serialize() {
      using (MemoryStream m = new MemoryStream()) {
         using (BinaryWriter writer = new BinaryWriter(m)) {
            writer.Write(Id);
            writer.Write(Name);
         }
         return m.ToArray();
      }
   }

   public static MyClass Desserialize(byte[] data) {
      MyClass result = new MyClass();
      using (MemoryStream m = new MemoryStream(data)) {
         using (BinaryReader reader = new BinaryReader(m)) {
            result.Id = reader.ReadInt32();
            result.Name = reader.ReadString();
         }
      }
      return result;
   }

}
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • what is i have several ints to write, and several string? – Smith May 03 '15 at 20:24
  • 2
    @Smith: Yes, you can do that, just write them after each other. The `BinaryWriter` will write them in a format that the `BinaryReader` can read, as long as you write and read them in the same order. – Guffa May 03 '15 at 21:12
  • 1
    what is the difference between `BinaryWriter/Reader` and using a `BinaryFormatter` – Smith May 03 '15 at 21:20
  • 4
    @Smith: Using `BinaryWriter/Reader` you do the serialisation/deserialisation yourself, and you can write/read only the data that is absolutely needed, as compact as possible. The `BinaryFormatter` uses reflection to find out what data to write/read, and uses a format that works for all possible cases. It also includes the meta information about the format in the stream, so that adds even more overhead. – Guffa May 03 '15 at 21:29
  • I have a property `command` which is an enum, but i did not see a `ReadEnum` like `ReadInt32()`, how do i write enum and ream them? – Smith May 03 '15 at 21:37
  • 2
    @Smith: You can cast the enum to `int` (or if you have specified any other type as storage for the enum) and write it. When you read it you can cast it to the enum type. – Guffa May 03 '15 at 21:40
  • Finally someone that offers a black on white alternative to BinaryFormatter which does not involve any external dependencies. Thank you. – Alexandru Dicu Nov 20 '18 at 12:11
32

Well a cast from myObject to byte[] is never going to work unless you've got an explicit conversion or if myObject is a byte[]. You need a serialization framework of some kind. There are plenty out there, including Protocol Buffers which is near and dear to me. It's pretty "lean and mean" in terms of both space and time.

You'll find that almost all serialization frameworks have significant restrictions on what you can serialize, however - Protocol Buffers more than some, due to being cross-platform.

If you can give more requirements, we can help you out more - but it's never going to be as simple as casting...

EDIT: Just to respond to this:

I need my binary file to contain the object's bytes. Only the bytes, no metadata whatsoever. Packed object-to-object. So I'll be implementing custom serialization.

Please bear in mind that the bytes in your objects are quite often references... so you'll need to work out what to do with them.

I suspect you'll find that designing and implementing your own custom serialization framework is harder than you imagine.

I would personally recommend that if you only need to do this for a few specific types, you don't bother trying to come up with a general serialization framework. Just implement an instance method and a static method in all the types you need:

public void WriteTo(Stream stream)
public static WhateverType ReadFrom(Stream stream)

One thing to bear in mind: everything becomes more tricky if you've got inheritance involved. Without inheritance, if you know what type you're starting with, you don't need to include any type information. Of course, there's also the matter of versioning - do you need to worry about backward and forward compatibility with different versions of your types?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Is it more correct for me to refer to this as "protobuf-csharp-port" (Google-code), or "dotnet-protobufs" (Git)? – Marc Gravell Sep 18 '09 at 20:16
  • 1
    I need my binary file to contain the object's bytes. Only the bytes, no metadata whatsoever. Packed object-to-object. So I'll be implementing custom serialization. – chuckhlogan Sep 18 '09 at 20:46
  • 6
    The risk of *zero* metadata is that you are then *very* version-intolerant, as it has very few ways of allowing flexibility before it is too late. Protocol buffers is pretty data-dense. Do you really need that extra turn of the screw? – Marc Gravell Sep 18 '09 at 21:14
  • @Marc: And of course for integers, PB can end up being denser than the raw bytes... – Jon Skeet Sep 18 '09 at 21:40
23

I took Crystalonics' answer and turned them into extension methods. I hope someone else will find them useful:

public static byte[] SerializeToByteArray(this object obj)
{
    if (obj == null)
    {
        return null;
    }
    var bf = new BinaryFormatter();
    using (var ms = new MemoryStream())
    {
        bf.Serialize(ms, obj);
        return ms.ToArray();
    }
}

public static T Deserialize<T>(this byte[] byteArray) where T : class
{
    if (byteArray == null)
    {
        return null;
    }
    using (var memStream = new MemoryStream())
    {
        var binForm = new BinaryFormatter();
        memStream.Write(byteArray, 0, byteArray.Length);
        memStream.Seek(0, SeekOrigin.Begin);
        var obj = (T)binForm.Deserialize(memStream);
        return obj;
    }
}
jhilden
  • 12,207
  • 5
  • 53
  • 76
23

Use of binary formatter is now considered unsafe. see --> Docs Microsoft

Just use System.Text.Json:

To serialize to bytes:

JsonSerializer.SerializeToUtf8Bytes(obj);

To deserialize to your type:

JsonSerializer.Deserialize(byteArray);

MichaelK
  • 373
  • 2
  • 7
14

You are really talking about serialization, which can take many forms. Since you want small and binary, protocol buffers may be a viable option - giving version tolerance and portability as well. Unlike BinaryFormatter, the protocol buffers wire format doesn't include all the type metadata; just very terse markers to identify data.

In .NET there are a few implementations; in particular

I'd humbly argue that protobuf-net (which I wrote) allows more .NET-idiomatic usage with typical C# classes ("regular" protocol-buffers tends to demand code-generation); for example:

[ProtoContract]
public class Person {
   [ProtoMember(1)]
   public int Id {get;set;}
   [ProtoMember(2)]
   public string Name {get;set;}
}
....
Person person = new Person { Id = 123, Name = "abc" };
Serializer.Serialize(destStream, person);
...
Person anotherPerson = Serializer.Deserialize<Person>(sourceStream);
Timeless
  • 7,338
  • 9
  • 60
  • 94
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 1
    Even "terse markers" are still metadata. My understanding of what the OP wanted was nothing but the data in the object. So, for example, if the object was a struct with 2 32-bit integers, then he would expect the result to be a byte array of 8 bytes. – user316117 Jan 23 '13 at 19:52
  • @user316117 which is then a real pain for versioning. Each approach has advantages and disadvantages. – Marc Gravell Jan 23 '13 at 20:30
  • [How to choose between protobuf-csharp-port and protobuf-net](https://stackoverflow.com/questions/2522376/how-to-choose-between-protobuf-csharp-port-and-protobuf-net) – Timeless Aug 07 '17 at 06:02
  • There is a way to avoid using the Proto* attributes ? The entities I want to use are in a 3rd party library. – Alex 75 Sep 18 '17 at 16:38
4

This worked for me:

byte[] bfoo = (byte[])foo;

foo is an Object that I'm 100% certain that is a byte array.

The Berga
  • 3,744
  • 2
  • 31
  • 34
4

I found Best Way this method worked correcly for me Use Newtonsoft.Json

public TData ByteToObj<TData>(byte[] arr){
                    return JsonConvert.DeserializeObject<TData>(Encoding.UTF8.GetString(arr));
    }

public byte[] ObjToByte<TData>(TData data){
            var json = JsonConvert.SerializeObject(data);
            return Encoding.UTF8.GetBytes(json);
}
2

Take a look at Serialization, a technique to "convert" an entire object to a byte stream. You may send it to the network or write it into a file and then restore it back to an object later.

Atmocreations
  • 9,923
  • 15
  • 67
  • 102
  • I think chuckhlogan explicitly declined that (Formatter==Serialization). – H H Sep 18 '09 at 20:12
  • @Henk - it depends what the *reasons* are; he mentioned the extra info, which I take to be type metadata and field info; you can use serialization without that overhead; just not with `BinaryFormatter`. – Marc Gravell Sep 18 '09 at 20:19
1

To access the memory of an object directly (to do a "core dump") you'll need to head into unsafe code.

If you want something more compact than BinaryWriter or a raw memory dump will give you, then you need to write some custom serialisation code that extracts the critical information from the object and packs it in an optimal way.

edit P.S. It's very easy to wrap the BinaryWriter approach into a DeflateStream to compress the data, which will usually roughly halve the size of the data.

Jason Williams
  • 56,972
  • 11
  • 108
  • 137
  • 1
    Unsafe code isn't enough. C# and CLR still won't let you take a raw pointer to a managed object even in unsafe code, or put two object references in a union. – Pavel Minaev Sep 18 '09 at 20:05
1

I believe what you're trying to do is impossible.

The junk that BinaryFormatter creates is necessary to recover the object from the file after your program stopped.
However it is possible to get the object data, you just need to know the exact size of it (more difficult than it sounds) :

public static unsafe byte[] Binarize(object obj, int size)
{
    var r = new byte[size];
    var rf = __makeref(obj);
    var a = **(IntPtr**)(&rf);
    Marshal.Copy(a, r, 0, size);
    return res;
}

this can be recovered via:

public unsafe static dynamic ToObject(byte[] bytes)
{
    var rf = __makeref(bytes);
    **(int**)(&rf) += 8;
    return GCHandle.Alloc(bytes).Target;
}

The reason why the above methods don't work for serialization is that the first four bytes in the returned data correspond to a RuntimeTypeHandle. The RuntimeTypeHandle describes the layout/type of the object but the value of it changes every time the program is ran.

EDIT: that is stupid don't do that --> If you already know the type of the object to be deserialized for certain you can switch those bytes for BitConvertes.GetBytes((int)typeof(yourtype).TypeHandle.Value) at the time of deserialization.

balage
  • 449
  • 8
  • 9
0

I found another way to convert an object to a byte[], here is my solution:

IEnumerable en = (IEnumerable) myObject;
byte[] myBytes = en.OfType<byte>().ToArray();

Regards

BatteryAcid
  • 8,381
  • 5
  • 28
  • 40
kyy8080
  • 47
  • 7
  • 1
    I dont think , this method converts object to byte[], rather it finds the type , here byte in the object , reports that properties back – Rahul Ranjan Jul 09 '21 at 11:00
0

This method returns an array of bytes from an object.

private byte[] ConvertBody(object model)
        {
            return Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(model));
        }
Yudner
  • 533
  • 4
  • 9
0

Spans are very useful for something like this. To put it simply, they are very fast ref structs that have a pointer to the first element and a length. They guarantee a contiguous region of memory and the JIT compiler is able to optimize based on these guarantees. They work just like pointer arrays you can see all the time in the C and C++ languages.

Ever since spans have been added, you are able to use two MemoryMarshal functions that can get all bytes of an object without the overhead of streams. Under the hood, it is just a little bit of casting. Just like you asked, there are no extra allocations going down to the bytes unless you copy them to an array or another span. Here is an example of the two functions in use to get the bytes of one:

public static Span<byte> GetBytes<T>(ref T o)
    where T : struct
{
    if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
        throw new Exception($"Type {nameof(T)} is or contains a reference");

    var singletonSpan = MemoryMarshal.CreateSpan(ref o, 1);
    var bytes = MemoryMarshal.AsBytes(singletonSpan);
    return bytes;
}

The first function, MemoryMarshal.CreateSpan, takes a reference to an object with a length for how many adjacent objects of the same type come immediately after it. They must be adjacent because spans guarantee contiguous regions of memory. In this case, the length is 1 because we are only working with the single object. Under the hood, it is done by creating a span beginning at the first element.

The second function, MemoryMarshal.AsBytes, takes a span and turns it into a span of bytes. This span still covers the argument object so any changes to the bytes will be reflected within the object. Fortunately, spans have a method called ToArray which copies all of the contents from the span into a new array. Under the hood, it creates a span over bytes instead of T and adjusts the length accordingly. If there's a span you want to copy into instead, there's the CopyTo method.

The if statement is there to ensure that you are not copying the bytes of a type that is or contains a reference for safety reasons. If it is not there, you may be copying a reference to an object that doesn't exist.

The type T must be a struct because MemoryMarshal.AsBytes requires a non-nullable type.

pvc pipe
  • 63
  • 7
0

Because of this compiler warning:

SerialDeserializerDefaultConcrete.cs(50, 17): [SYSLIB0011] 'BinaryFormatter.Serialize(Stream, object)' is obsolete: 'BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.'

I have moved to the json solutions for later .net (core based) target frameworks.

And because of the tension with pre-system.text.json and system.text.json.. i have created a "both" answer.

Note, my answer is a mixture of everything everything else above.

I have added : Interface and Concrete encapsulation. I believe in "write to an interface, not a concrete".

namespace MyStuff.Interfaces
{
public interface ISerialDeserializer<T> where T : new()
{
    byte[] SerializeToByteArray(T obj);

    T Deserialize(byte[] byteArray);
}
}



#if NET6_0_OR_GREATER
using System.Text.Json;
#endif

#if !NET6_0_OR_GREATER
using System;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
#endif


using MyStuff.Interfaces;

namespace MyStuff.Concrete
{
    public class SerialDeserializerDefaultConcrete<T> : ISerialDeserializer<T> where T : new()
    {
#if NET6_0_OR_GREATER
        public byte[] SerializeToByteArray(T obj)
        {
            if (obj == null)
            {
                return null;
            }

            return JsonSerializer.SerializeToUtf8Bytes(obj);
        }

        public T Deserialize(byte[] byteArray)
        {
            if (byteArray == null)
            {
                return default(T);
            }

            return JsonSerializer.Deserialize<T>(byteArray);
        }
#endif

#if !NET6_0_OR_GREATER
        public byte[] SerializeToByteArray(T obj)
        {
            if (obj == null)
            {
                return null;
            }

            var bf = new BinaryFormatter();
            using (var ms = new MemoryStream())
            {
                bf.Serialize(ms, obj);
                return ms.ToArray();
            }
        }

        public T Deserialize(byte[] byteArray)
        {
            if (byteArray == null)
            {
                return default(T);
            }

            using (var memStream = new MemoryStream())
            {
                var binForm = new BinaryFormatter();
                memStream.Write(byteArray, 0, byteArray.Length);
                memStream.Seek(0, SeekOrigin.Begin);
                var obj = (T) binForm.Deserialize(memStream);
                return obj;
            }
        }
#endif
    }
}

and the target-frameworks of my csproj:

<PropertyGroup>
    <TargetFrameworks>netstandard2.0;netstandard2.1;net6.0</TargetFrameworks>
</PropertyGroup>

You could also (probably) use NewtonSoft for pre-6.0 frameworks. Newtonsoft references has been sometimes problematic, thus why I went with the in MemoryStream version for pre-System.Text.Json frameworks.

granadaCoder
  • 26,328
  • 10
  • 113
  • 146
-1

You can use below method to convert list of objects into byte array using System.Text.Json serialization.

private static byte[] CovertToByteArray(List<object> mergedResponse)
{
var options = new JsonSerializerOptions
{
 PropertyNameCaseInsensitive = true,
};
if (mergedResponse != null && mergedResponse.Any())
{
return JsonSerializer.SerializeToUtf8Bytes(mergedResponse, options);
}

 return new byte[] { };
}
CHIRAG LADDHA
  • 29
  • 1
  • 5