9

I am trying to dynamically parse a given .proto file in Java to decode a Protobuf-encoded binary.

I have the following parsing method, in which the "proto" string contains the content of the .proto file:

public static Descriptors.FileDescriptor parseProto (String proto) throws InvalidProtocolBufferException, Descriptors.DescriptorValidationException {
        DescriptorProtos.FileDescriptorProto descriptorProto = DescriptorProtos.FileDescriptorProto.parseFrom(proto.getBytes());
        return Descriptors.FileDescriptor.buildFrom(descriptorProto, null);
}

Though, on execution the previous method throws an exception with the message "Protocol message tag had invalid wire type.". I use the example .proto file from Google so I guess it is valid: https://github.com/google/protobuf/blob/master/examples/addressbook.proto

Here is the stack trace:

15:43:24.707 [pool-1-thread-1] ERROR com.github.whiver.nifi.processor.ProtobufDecoderProcessor - ProtobufDecoderProcessor[id=42c8ab94-2d8a-491b-bd99-b4451d127ae0] Protocol message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
    at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:115)
    at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:551)
    at com.google.protobuf.GeneratedMessageV3.parseUnknownField(GeneratedMessageV3.java:293)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:88)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:53)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:773)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:768)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:163)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:197)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.parseFrom(DescriptorProtos.java:260)
    at com.github.whiver.nifi.parser.SchemaParser.parseProto(SchemaParser.java:9)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.lambda$onTrigger$0(ProtobufDecoderProcessor.java:103)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:895)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:62)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.onTrigger(ProtobufDecoderProcessor.java:100)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:251)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:245)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Any idea? Thank you!

WSH
  • 331
  • 1
  • 2
  • 10

4 Answers4

10

It looks like you're trying to use FileDescriptorSet.parseFrom to populate a FileDescriptorSet. This will only work if the bytes you're providing are the binary protobuf contents - which is to say: a compiled schema. You can get a compiled schema by using the protoc command-line-tool with the --descriptor_set_out option. What you're actually passing it right now is the text bytes that make up the text schema, which is not what parseFrom expects.

Without a compiled schema, you would need a runtime .proto parser. I'm not aware of one for Java; protobuf-net includes one (protobuf-net.Reflection), but that is C#/.NET. Without an available runtime .proto parser, you'd need to shell-execute protoc instead.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • I see, this makes sense. I will try to find a way to compile my proto file. Thank you for your answer! – WSH Dec 04 '17 at 14:26
  • @WSH it *should* just be `protoc --descriptor_set_out` – Marc Gravell Dec 04 '17 at 14:32
  • Okay, thanks for the precision. However if someone has ever heard about a Java proto compiler, I'm still interested :) – WSH Dec 04 '17 at 14:46
  • 2
    @WSH There probably isn't a Java implementation. The protocol compiler is more complicated than you might expect, and maintaining multiple implementations of the compiler in many languages wouldn't make a lot of sense. What you should try to do is arrange to have your .protos parsed using --descriptor_set_out *offline*, then send around the compiled descriptor set as needed, rather than try to parse the whole .proto file on-demand. – Kenton Varda Dec 05 '17 at 04:12
  • Hey there @kenton, great to see you still chiming in on protobuf questions. And yes, I found some of those complications when I finally got around to writing a 100% c# implementation and running it through every .proto I could find including most of the public Google API surface :) – Marc Gravell Dec 05 '17 at 07:51
  • @Kenton I understand, I have found a workaround to avoid having to parse the .proto files. Btw, I have also found this project which seems to be a full implementation of Protobuf in Java and is said to support .proto files parsing: https://github.com/square/wire/ Maybe I'll give it a look. – WSH Dec 05 '17 at 15:47
  • One correction `FileDescriptorSet` https://developers.google.com/protocol-buffers/docs/techniques#self-description – George Campbell Apr 28 '20 at 08:49
  • @GeorgeCampbell ta – Marc Gravell Apr 28 '20 at 10:46
  • @MarcGravell can you show how to do it in protobuf-net? Task: given protobuf as a string, get the C# class by making a call in C# code. – morpheus Mar 11 '23 at 19:50
  • @morpheus protobuf-net.Reflection is the library that has all the parsing and code-gen; here's the main file from `protogen`, which exposed this in a command-line interface like `protoc`: https://github.com/protobuf-net/protobuf-net/blob/main/src/protogen/Program.cs - look for `.Process` and `.Generate`. the same tools are also available via Roslyn plugins, a website, a "dotnet tool", etc – Marc Gravell Mar 11 '23 at 21:10
2

Drawing from the other answers, here's a snippet of working Kotlin code from a library I'm developing. https://github.com/asarkar/okgrpc

private fun lookupProtos(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path,
    resolved: MutableSet<String>
): List<DescriptorProtos.FileDescriptorProto> {
    val schema = generateSchema(protoPaths, protoFile, tempDir)
    return schema.fileList
        .filter { resolved.add(it.name) }
        .flatMap { fd ->
            fd.dependencyList
                .filterNot(resolved::contains)
                .flatMap { lookupProtos(protoPaths, it, tempDir, resolved) } + fd
        }
}

private fun generateSchema(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path
): DescriptorProtos.FileDescriptorSet {
    val outFile = Files.createTempFile(tempDir, null, null)
    val stderr = ByteArrayOutputStream()
    val exitCode = Protoc.runProtoc(
        (protoPaths.map { "--proto_path=$it" } + listOf("--descriptor_set_out=$outFile", protoFile)).toTypedArray(),
        DevNull,
        stderr
    )
    if (exitCode != 0) {
        throw IllegalStateException("Failed to generate schema for: $protoFile")
    }
    return Files.newInputStream(outFile).use { DescriptorProtos.FileDescriptorSet.parseFrom(it) }
}

The idea is to use os72/protoc-jar to write out a compiled schema/file descriptor. Then use FileDescriptorSet.parseFrom to read that file, and recurse on its dependencies.

Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219
0

An alternaive to "shelling out" to exec protoc would be to use a .proto parser written in Java. There seem to be a few floating around - Google something like "proto parser in java". (I'm looking for one for an issue in my project).

vorburger
  • 3,439
  • 32
  • 38
-1

Don't use java String to hold the protobuf payload. The issue is that String does translations behind the scenes, and makes assumptions about character sets.

Protobuf works on byte arrays, and the exact representation in the array has to be unchanged. Going to and from String does not work.

Bob Dalgleish
  • 8,167
  • 4
  • 32
  • 42
  • That depends on whether they're loading the *data*, vs loading a *schema*. A schema (in .proto format) is text. – Marc Gravell Dec 04 '17 at 14:14
  • As Bob said, I am trying to parse a text file so I guess String should not be a problem. – WSH Dec 04 '17 at 14:27