0

I'm trying to reverse-engineer and document several protobufs for which I don't have the descriptor metadata, and to create .proto files for them. It's been going great until I encountered a protobuf where two completely differently structured messages share the same unique ID. The top level is simple:

message main
{
    string user=1;
    repeated Section sections=2;
}

Looking at the Section type, there are some that look like this:

message Section
{
    string name=1;
    string fulldescription=2;
    string briefdescription=3;
    int32 level=4;
    ...
}

...and some that look like this:

message Section
{
    int32 cost=1;
    int32 tier=2;
    int64 timestamp=3;
    ...
}

It would make perfect sense if one of these had the id of 2 and the other of, say, 3, but no, both types show with the unique ID of 2. The protobuf documentation very clearly states that each field in the message definition must have a unique number, and here is a perfectly valid (well, working) protobuf that does the exact opposite. I don't understand how it's possible or, more importantly, how to re-create this in the .proto file?

This does not seem to be a "oneof" situation either, since both message types are present in the same protobuf at the same time, and at any rate the "oneof" alternatives would still have different identifiers for fields of different types.

For reference, here's an example excerpt from the output generated by protoc --decode-raw, which I'm trying to document:

    1 {
      1: "User123"
      2 {
        1 {
          1: "JohnDoe"
          2: "This is the full description of the user, with a lot of details"
          3: "A brief summary of the above"
          4: 13
        }
       }
      2 {
        1 {
          1: 135
          2: 2
          3: 1653606400
         }
       }
     }

(This post seems to be asking the same thing, but it's old and doesn't have an actual answer: Can you assign multiple different value types to one field in a repeated Protobuf message?)

(This is my absolutely very first StackOverflow question, so apologies if the quality of the post is not up to snuff; please let me know what I need to add to make it clearer).

Ioj
  • 1
  • 3
  • I would recommend you to check the package/imports at the top of the main message definition, because I believe only one of the Section is used here. This is not possible to have the two different messages being used in the repeated field. This would cause inconsistent deserialization (maybe error?) – Clément Jean May 27 '22 at 02:13
  • Clément, that is *exactly* the issue here. The definitions are mine, here just as an illustration of how I would document the protobuf given what I see after decompiling it with protoc --decode-raw. And what I see is clearly two different messages of the same type, using the same ID (2). If such a setup is possible to achieve with something other than "repeated", that's probably going to be the answer to my question :) I've ruled out "oneof", and "any" isn't looking like it can do this, but there must be something, because the protobuf is being generated *somehow*. – Ioj May 27 '22 at 04:35
  • Maybe [`struct`](https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/struct) ? This is basically a dynamic document. That's the only thing I can see. – Clément Jean May 27 '22 at 06:34
  • 1
    `struct` would be something to use in the parser code, no? (My code is in Python, btw). What I need first, however, is a way to define the field in the `.proto` file. So what I ended up doing is replacing `repeated Section sections=2;` with `repeated bytes sections=2;` and avoid defining `Section` altogether. This way messages with the ID of 2 come through as bytes (base64-encoded content), and the parser code can later isolate them individually and parse depending on the contents detected. Not the most elegant solution I'm sure, so if anyone has a better one, it would be much appreciated. – Ioj May 27 '22 at 23:30
  • I actually think you are right, these seems to be `bytes` because I tried with both `any` and `struct` and it doesn't match the same format. – Clément Jean May 28 '22 at 03:39

0 Answers0