4

I'm trying to serialize a reasonably large amount of data with protobuf.net. I'm hitting problems with OutOfMemoryExceptions being thrown. I'm trying to stream the data using IEnumerable<DTO> so as not to use too much memory. Here's a simplified version of the program that should cause the error:

class Program
{
    static void Main(string[] args)
    {
        using (var f = File.Create("Data.protobuf"))
        {
            ProtoBuf.Serializer.Serialize<IEnumerable<DTO>>(f, GenerateData(1000000));
        }

        using (var f = File.OpenRead("Data.protobuf"))
        {
            var dtos = ProtoBuf.Serializer.DeserializeItems<DTO>(f, ProtoBuf.PrefixStyle.Base128, 1);
            Console.WriteLine(dtos.Count());
        }
        Console.Read();
    }

    static IEnumerable<DTO> GenerateData(int count)
    {
        for (int i = 0; i < count; i++)
        {
            // reduce to 1100 to use much less memory
            var dto = new DTO { Data = new byte[1101] };
            for (int j = 0; j < dto.Data.Length; j++)
            {
                // fill with data
                dto.Data[j] = (byte)(i + j);
            }
            yield return dto;
        }
    }
}

[ProtoBuf.ProtoContract]
class DTO
{
    [ProtoBuf.ProtoMember(1, DataFormat=ProtoBuf.DataFormat.Group)]
    public byte[] Data
    {
        get;
        set;
    }
}

Interestingly, if you reduce the size of the array on each DTO to 1100 the problem goes away! In my actual code, I'd like to do something similar but it's an array of floats that I'll be serializing, not bytes. N.B. I think you can skip the filling with data part to speed up the problem.

This is using protobuf version 2.0.0.594. Any help would be much appreciated!

EDIT:

Same problem seen with version 2.0.0.480. Code wouldn't run with version 1.0.0.280.

markmuetz
  • 9,334
  • 2
  • 32
  • 33
  • How big is the file when it's saved to disk (MB, GB)? – Kiril Oct 31 '12 at 16:43
  • @Lirik: If you set the size of the array to 1100, the file ends up being pretty large: just over one GB. Obviously file doesn't get written if exception gets thrown. – markmuetz Oct 31 '12 at 16:46
  • C# cannot support contiguous memory allocation of more than about 1.5 GB, even if the serialized data is less than 1 GB, it may be closer to 1.5 GB when deserialized. – Kiril Oct 31 '12 at 16:48
  • One more clue: If you set the array size to 1100, you can watch the file grow as the program runs. If you set it to 1101, then the file size stays the same (0) until the exception is thrown. Could it be a problem with the file not being flushed to disk? – markmuetz Oct 31 '12 at 17:13
  • this also immediately throws as you run the program without the memory being allocated: `byte[] data = new byte[2147483648];` – Kiril Oct 31 '12 at 17:19
  • btw, this should be fixed in the current builds (600+) – Marc Gravell Nov 08 '12 at 10:00

2 Answers2

3

k; this was was some unfortunate timing - basically, it was only checking whether it should flush whenever the buffer got full, and as a consequence of being in the middle of writing a length-prefixed item, it was never able to properly flush at that point. I've added a tweak so that whenever it finds it reaches a flushable state, and there is something worth flushing (currently 1024 bytes), then it will flush more aggressively. This has been committed as r597. With that patch, it now works as expected.

In the interim, there is a way of avoiding this glitch without changing version: iterate over the data at source, serializing each individually with SerializeWithLengthPrefix specifying prefix-style base-128, and field-number 1; this is 100% identical in terms of what goes over the wire, but has a separate serialization cycle for each:

using (var f = File.Create("Data.protobuf"))
{
    foreach(var obj in GenerateData(1000000))
    {
        Serializer.SerializeWithLengthPrefix<DTO>(
            f, obj, PrefixStyle.Base128, Serializer.ListItemTag);
    }
}

Thanks for noticing ;p

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
2

It seems that you're passing the 1.5 GB limit: Allocating more than 1,000 MB of memory in 32-bit .NET process

You've already noticed that when you reduce the sample size, your application runs fine. This is not an issue with protobuf (I presume), but with your attempt to create an array which requires more than 1.5 GB of memory to be allocated.

Update

Here is a simple test:

byte[] data = new byte[2147483648];

That should cause an OutOfMemoryException, so would this:

byte[][] buffer = new byte[1024][];
for (int i = 0; i < 1024; i++)
{
    buffer[i] = new byte[2097152];
}

Something is aggregating your data bytes into a contiguous container of more than 1.5 GB.

Community
  • 1
  • 1
Kiril
  • 39,672
  • 31
  • 167
  • 226
  • Memory should not be being used, the `yield return` in the `for` loop should see to that. Also, if this were the problem, I should see the same problem regardless of whether I set the size of the array to 1100 or 1101 shouldn't I? – markmuetz Oct 31 '12 at 16:55
  • Additionally, I'm only using about 300MB of memory (according to Task Manager at lease) when the exception gets thrown. – markmuetz Oct 31 '12 at 16:56
  • @markmuetz which line does the exception occur on? Furthermore, I'm not concerned about whether or not you `yield return`, but how much contiguous memory is actually allocated. If somewhere in your application 1.5 GB of memory is allocated, you will get the out of memory exception. – Kiril Oct 31 '12 at 17:04
  • the exception is thrown on the `ProtoBuf.Serializer.Serialize(...)` line. Its `Source` is protobuf-net. I can stick the `GenerateData(1000000)` call in a `foreach` loop and loop over each `DTO` without any difficulty. It might be the file object (`f`) that is causing the problem with the contiguous data, see comment above. – markmuetz Oct 31 '12 at 17:11
  • @markmuetz That's understandable: if I remove the `buffer[i]` and replace it with a `byte[] data` in my loop, then things will work fine too. The problem is that `ProtoBuf` might be aggregating the data into a single container, which will not work. You may also be able to "fix" the issue by reducing the number of data generated from 1000000 to say 500000 - see if that works. – Kiril Oct 31 '12 at 17:14
  • @markmuetz In that case the serializer might buffer it in memory rather than flushing it to the file in between. – René Wolferink Oct 31 '12 at 17:16
  • @Lirik with array length set to 1101, reducing the number of `DTO`s to 242000 means it will run and write the file successfully. It uses a lot of memory though (285Mb). This is a lot more than when the array length is 1100, which only uses about 16-17MB of RAM. – markmuetz Oct 31 '12 at 17:23
  • @RenéWolferink How would I find out how much memory the serializer is using? I've got experience of using the Red Gate Ants profiler but unfortunately no licensed copy right now. – markmuetz Oct 31 '12 at 17:24
  • 2
    @markmuetz that's definitely an interesting issue and I'm not really sure what else could be going on there. However, the main issue that I wanted to address is the fact that at some point, your application is attempting to utilize 1.5 GB of contiguous space. – Kiril Oct 31 '12 at 18:57
  • 3
    damnedable bugs creeping in and wotnot! your analysis of the symptom is correct, but there was no reason (other than fat-fingers) that it needed to do this - see my answer for an update. – Marc Gravell Oct 31 '12 at 20:31