0

What I'm trying to do here: I have a dump from the Kafka stream with an unknown amount of photobuff records stored there in binary format. I want to decode them and print them one by one to console in JSON format. I have looked all over the internet but seems that there is no clear answer on reading data from the raw binary file with an unknown amount of photobuff records inside of it. I found this one: How to decode binary/raw google protobuf data but it is related to the simple decoding of one known record with protoc

I've tried the following, but I seem to do not understand fully how to work with proto.buffer.go struct, since I can only see the first value, out of all the 26 kb data.

package main

import (
    "encoding/json"
    "fmt"
    "github.com/golang/protobuf/proto"
    "io/ioutil"
    "parseRawDHCP/pb"
)

func main() {
    file, err := ioutil.ReadFile("file")
    if err != nil {
        fmt.Printf("unable to read file %v", err)
    }
    msg := pb.Msg{}
    buffer := proto.NewBuffer(file)
    for {
        err := buffer.DecodeMessage(&msg)
        if err != nil {
            panic("unable to decode message")
        }
        marshalledStruct, err := json.Marshal(msg)
        if err != nil {
            panic("can't marshalledStruct the data from message")
        }
        if err == nil {
            fmt.Printf("message is: %v", marshalledStruct)
            continue
        }
    }
}

If someone can point me in a direction on how to correctly decode raw binary into protobuffs I would greatly appreciate it.

Igor Tiulkanov
  • 552
  • 5
  • 18

1 Answers1

1

A proto message by itself comes with no length and no end-of-message indication.

If your file contains marshalled proto messages all jammed together, then there's no way to decode them individually. An attempt to decode multiple messages as a single one will decode everything into a single struct, overwriting every field as it proceeds.

If your file contains length-prefixed messages (see buffer.EncodeMessage), then your sample code should be able to decode them (and panic at EOF). But I doubt that they were serialized that way.

rustyx
  • 80,671
  • 25
  • 200
  • 267
  • Yeah. I'm trying to read binary damp of kafka console consumer to check some values inside of a protobuff messages. They most likely marshalled with proto.Marshal() call, so I don't think they have any info about their length I can use. – Igor Tiulkanov Sep 29 '21 at 07:55
  • You can try prepending the length (in varint encoding) to every message, don't know if Kafka tooling can do that. – rustyx Sep 29 '21 at 11:43
  • Unfortunately, I don't control the incoming data – Igor Tiulkanov Sep 29 '21 at 17:00