How to read multiple protobufs from one file in java?

Question

i have a file "test.txt" containing multiple protobuf "TestMessage" messages written with testMessage.writeDelimitedTo(the-DataOutputStream-that uses a new FileOutputStream pointing to the file) per line. How do I read each line of test.txt and get back the protobuf per line?

With a bufferedreader on a file containing strings, I would do:

String strLine; // What is the alternative to String?
    while ((strLine = br.readLine()) != null)   {
         System.out.println (strLine);
         TestMessage test = new TestMessage.builder();
         test.parseDelimitedFrom(strLine);
    }

What do I set the type to instead of "String" if I were to do this method? is this possible?

Or can I not do this and each mressage must be written to a separate file?

NOTE: Assume TestMessage is the only message.

You need to use the Java Protocol Buffers API. It's not a text file, you shouldn't be trying to read it line-by-line. — nobody, Nov 14 '14 at 21:39
So your saying if I have 10 messages I want to save to read later in a file, I should use separate files instead? — Rolando, Nov 14 '14 at 21:43
@Rolando not necessarily. You could create your own headers for each object stored in the file as I've described in my answer. If you don't have the time or inclination to create your own header, then yes. You'll have to use separate files for each object. — William Morrison, Nov 14 '14 at 21:46
So if I am using the "writeDelimitedTo" on the protobuf message, how do I find out the length to store? Or should I not be using the writeDelimitedTo function to begin with? — Rolando, Nov 14 '14 at 21:48
In addition, you MUST have and use the matching protoc definition of actual message structure: without this, there is no real way to access encoded data. You don't necessarily need to use generated Java object (depending on library), but anything that reads protobuf encoded data needs to base it on protoc definition. — StaxMan, Nov 14 '14 at 21:48
My question assumes TestMessage is defined and the same builder is used for the write and the read. — Rolando, Nov 14 '14 at 21:49
@Rolando You can use the `writeDelimitedTo` and the corresponding `parseDelimitedFrom`, but you'll still need to include some information about what type of object follows unless you are absolutely sure `TestMessage` is and will always be the only message stored in the stream. — William Morrison, Nov 14 '14 at 21:50
Yes, I am absolutely sure TestMessage is and will always be the only message stored in the stream. Does this mean I have no need for headers and I can just write as is with continual appends to the single file? Is readLine completely out of the question for this? — Rolando, Nov 14 '14 at 21:53
Yes, that's what that means. Yes, readLine is completely out of the question. Use the protobuf `parseDelimitedFrom` method repeatedly on your stream. — William Morrison, Nov 14 '14 at 22:17
Can you to show me what this looks like? I am more familiar with reading files "line by line" and am not familiar with continuously using that one method on the stream. — Rolando, Nov 14 '14 at 22:26

terry · Accepted Answer · 2015-04-16T12:25:35.090

Why do you write each message per line? I think you can just use writeDelimitedTo, then the messages can write one by one. And reading is very simple.

User user = User.newBuilder().setUid(1).build();
User user2 = User.newBuilder().setUid(2).build();
try {
    FileOutputStream output = new FileOutputStream(path);
    user.writeDelimitedTo(output);
    user.writeDelimitedTo(output);
    user2.writeDelimitedTo(output);
    output.close();
} catch (Exception e) {
    System.out.print("Write error!");
}

try {
    FileInputStream input = new FileInputStream(path);
    while (true) {
        User user_ = User.parseDelimitedFrom(input);
        if (user_ == null)
            break;
        System.out.println("read from file: \n" + user_);
    }
} catch (Exception e) {
    System.out.println("Read error!");
}

You should explain your answer more. Just posting the code that will work is not going to help other people with similar problems. — mhlz, Apr 14 '15 at 12:28

William Morrison · Answer 2 · 2014-11-14T21:44:23.513

Protobufs don't have much in common with a line-separated text file. Protobuf is used to break objects into bytes. This process is called serialization. Protobuf is especially focused on compatibility, and small size.

The problem you're having is protobufs do not store information about how many bytes each object is composed of, or what type each object is. So, if you store many protobuf serialized objects to a file, you can't extract them without including data about what type of object is to follow, and how many bytes that object is made of.

This data is referred to as a header.

public void serializeProtobufObject(OutputStream stream, Object obj){
    byte[] bytes = getProtobufBytes(obj);
    int id = getObjectID(obj);

    //write protobuf header info
    writeInt(stream,id);
    writeInt(stream,bytes.length);

    //write protobuf payload
    stream.write(bytes,0,bytes.length);
}

//called repeatedly for many objects in the same stream.
public Object deserializeProtobufObject(InputStream stream){
    //read protobuf header
    int id = readInt(stream);
    int length = readInt(stream);

    //use header to interpret payload
    return readObject(id, length, stream);
}

An integer ID will tell you what type of object is following. An integer length tells you how many bytes the object is composed of. When you deserialize, you'll use these 2 pieces of information to extract the protobuf object. You'll do this repeatedly if you've many protobuf objects in the same stream.

A superior approach here would be to create a Protobuf object for these 2 fields and serialize objects like so to your stream:

ProtobufHeader for Foo
[Foo]
ProtobufHeader for Bar
[Bar]

This would allow you to expand your protobuf header in the future.

How does the "readObject" work? Since you cannot read line by line it seams. Each file is "appended" to a new line of the one file I am working with. — Rolando, Nov 14 '14 at 22:07
ReadObject extracts `length` bytes from the stream, and attempts to read an object mapped to the integer type id with protobuf's read routines. Also toss out the notion of new-lines as being some kind of delimiter with protobufs. New lines mean nothing in protobuf. — William Morrison, Nov 14 '14 at 22:18
I was expecting to be able to write out bytes as text per line, then read them back out into objects. — Rolando, Nov 14 '14 at 22:25
This is dangerous because your serialized protobuf object could contain a new line character. — William Morrison, Nov 14 '14 at 22:46
I see. Could you further expound on your example specifically the readObject? I am having trouble seeing how it matches with calling the parseDelimitedFrom over and over again. — Rolando, Nov 15 '14 at 00:13

How to read multiple protobufs from one file in java?

2 Answers2