Can protobuf read partially?

Question

I want to save my terrain data to a file and load only some parts of it, because it's just too big to store it in memory as a whole. Actually I don't even know whether the protobuf is good for this purposes.

For example I would have a structure like (might be invalid gramatically, I know only simple basics):

message Quad {
    required int32 x = 1;
    required int32 z = 2;

    repeated int32 y = 3;
}

The x and z values are available in my program and by using them I would like to find the correct Quad object with the same x and z (in the file) to obtain y values. However, I can't just parse the file with the ParseFromIstream(), because (I think so) it loads whole file into memory, but in my case the file is just too big.

So, is the protobuf able to load one object, send me for checking it and if the object is wrong give me the second one?

Actually... I could just ask: does the ParseFromIstream() loads whole file into memory?

@infact that is a ridiculous comment without some form of qualification, I.e. what you feel it fails at. — Marc Gravell, Feb 15 '13 at 17:39

score 4 · Answer 1 · answered Feb 16 '13 at 08:45

While some libraries to allow you to read files partially, the technique recommended by Google is to simply have the file consist of multiple messages:

https://developers.google.com/protocol-buffers/docs/techniques

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data.

So you could just write a long sequence of Quad messages to the file, delimited by the lengths of the messages. If you need to seek randomly to specific Quads, you may want to add some kind of an index.

score 2 · Accepted Answer · answered Feb 15 '13 at 17:37

2

This depends on which implementation you are using. Some have "read as a sequence" APIs. For example, assuming you stored it as a "repeated Quad", then with protobuf-net that would be:

int x = ..., y = ...;
var found = Serializer.DeserializeItems<Quad>(source)
            .Where(q => q.x ==x && q.y == y);

The point being: it yields a spooling (not loaded all at once) and short-circuiting sequence.

I don't know the c++ api specifically, but I would hope it has something similar - but worst case you could parse the varint headers and prepare a length-capped stream.

answered Feb 15 '13 at 17:37

Marc Gravell

1,026,079
266
2,566
2,900

Hey, I am currently reading some of your other answers which I think are similar to my problem, but I wonder if the use of protobuf will be reasonable for me. For now I had the data as plain text "[x z] y y y y y....." and in the next line again [x1 z1] y1 y1 y1 y1... etc. Will the protobuf serialization make the file much more smaller in such case? – tobi Feb 15 '13 at 19:31
Oh, I have found good answer http://stackoverflow.com/questions/7174635/does-protobuf-net-has-build-in-compression-for-serialization the y values range from 0 to 255 (a lot of them will be aroung ~127), so it looks like it will compress it well. – tobi Feb 15 '13 at 19:55

Can protobuf read partially?

2 Answers2