4

I need to deserialize 1.5GB txt file. I am using protobuf-net from code.google.com/p/protobuf-net/

Sometimes it fails (about 50% cases) with different exceptions (null reference, memory access violation) in different places. I have noticed that if processor is lowly loaded then probability of failure is decreasing.

What should I do to avoid such failures?

Here is the example of deserializing code:

public static History LoadFromFile(string path)
    {
        using (var fileStream = File.OpenRead(path))
        {
            var obj = Serializer.Deserialize<History>(fileStream);                
            return obj;
        }
    }

Today I have a FatalExecutionEngineError with error code 0xc0000005, but I can't realize what part of code is possibly unsafe. It's not a constant error, everything works correctly after I restart the application.

Here is the example of files with serialization, which I need to deserialize: https://docs.google.com/file/d/0B1XaGInC6jg3ZXBZZDA3bHh3bVk/edit

  • Confused... Protobuf != text. Can you clarify what you are doing? Maybe some code? – Marc Gravell Jan 08 '13 at 15:54
  • I understand the difference. At first I serialize a large amount of data in .txt file for performance reasons. By the way, it fails when the size of the resulting file is more than 2GB. I think that the cause of it is int variables in the protobuf-net.dll After that I try to serialize data into several files and then get a whole object after deserialization. But after it some different mistakes I mentioned above has arisen. – user1958298 Jan 08 '13 at 16:51
  • I'll have to investigate the 2GB issue, but: the data being serialized may impact how much bigger it can get (some things may require buffering). Incidentally, the core google protobuf docs advise against huge files. But: you still haven't given me much to go on in terms of reproducing it. I can try some random things, but I can't guarantee I'll get a repro. – Marc Gravell Jan 08 '13 at 18:29
  • @MarcGravell, I add some code and file examples to the head message. Tell me please, if you need any other information. It's crucial for me to avoid such unconstant error. – user1958298 Jan 09 '13 at 10:44
  • with file "d" I can see 383232 outer nodes; file "e" seems corrupt - after 133792 outer nodes (and 538642144 bytes), I get a field 0 (which is illegal). However, because protobuf is an ambiguous wire format (you cannot reliably understand the data without the schema), can you perhaps share some schema information? or ideally: the `History` class? Either here on stackoverflow, or by email if it is sensitive – Marc Gravell Jan 09 '13 at 11:06
  • also, I don't suppose you have a StackTrace from any of the failures? – Marc Gravell Jan 09 '13 at 11:09
  • I send you a History class by email. I've only have come screenshots of the errors, with no additional info. – user1958298 Jan 09 '13 at 11:34
  • I attempted to use Protobuf for large state persistence and was really surprised to find that the overhead of the inefficient implementation used for serialization made its performance on par with JSON.NET's json serializer. I've tried MessagePack too with no success. Anyone know any other libraries that are efficient as serialization/deserialization at large objects that can reach >1GB when persisted? At this point I'm basically stuck with JSON.NET. – Austin Salgat Aug 10 '18 at 04:29

1 Answers1

1

Google :

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy. That said, Protocol Buffers are great for handling individual messages within a large data set.

Source link

gokan
  • 1,028
  • 1
  • 14
  • 30