15

I have created a test app that can recognize some image using Goggle Goggles. It works for me, but I receive binary protobuf response. I have no proto-files, just binary response. How can I get data from it? (Have sent some image with bottle of bear and got the next response):

A
TuborgLogo9 HoaniText���;�)b���2d8e991bff16229f6"�
+TR=T=AQBd6Cl4Kd8:X=OqSEi:S=_rSozFBgfKt5d9b0
+TR=T=6rLQxKE2xdA:X=OqSEi:S=gd6Aqb28X0ltBU9V
+TR=T=uGPf9zJDWe0:X=OqSEi:S=32zTfdIOdI6kuUTa
+TR=T=RLkVoGVd92I:X=OqSEi:S=P7yOhvSAOQW6SRHN
+TR=T=J1FMvNmcyMk:X=OqSEi:S=5Z631_rd2ijo_iuf�

need to get string "Tuborg" and if possible type - "Logo"

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Igor
  • 173
  • 1
  • 1
  • 9
  • What languages do you have at your disposal? there are decoding-stream-readers available in most implementations. I worry, however, about how that data has been encoded. Do you have the actual `byte[]` data? (I ask re the question, as that looks like badly-encoded (aka corrupt) data) – Marc Gravell Oct 27 '11 at 10:50
  • no, data is ok. I have read response as byte[] array and it jusrt String representation os this data. Below is another response for image with Steve Jobs name: Z Steve Jobs Text # Manifesto da Morto Similar Image???;?~Ui4{C~27e437b3469557e98"? +TR=T=2NKSRNijdzY:X=Op9HS:S=Q5E2GgGTn2FHbvXR +TR=T=-hAbOrM2yME:X=Op9HS:S=XJ0SUV2EWG1A2Z7O +TR=T=4fEn46Y2-xM:X=Op9HS:S=xIk9lP93EkhBQroz +TR=T=-s6bJFuLuRo:X=Op9HS:S=cPnqT3_nWM61zYnv +TR=T=2hSkrpEoO10:X=Op9HS:S=jsH5Uv1-X9WMQNoN? – Igor Oct 31 '11 at 09:18
  • "jusrt String representation os this data" the only "string representations" that could make sense here are: base-64 or hex-encoding. It looks like you are running that through UTF-8 or something, which is ***not valid*** and will lose data. Also, and I repeat: what languages do you have at your disposal? – Marc Gravell Oct 31 '11 at 09:22
  • I'm using JAVA, here is link to my code (converted it from C# sample) http://prodroid.com.ua/?p=385 Thanks! – Igor Nov 01 '11 at 08:17
  • if the response is binary, you ***cannot*** use `readLine` - that is incorrect and will totally corrupt the data. You ***must*** read it as binary. – Marc Gravell Nov 01 '11 at 08:23
  • yes, I understand, it is just for demonstrations. In real sample I am reading bytes and filling byte[] array. (Then if to print this array as a sting will see data as above) – Igor Nov 01 '11 at 12:55
  • @Igor Have you got any way to do what you have asked ? I am also facing the same problem as you – Deepak Aug 10 '15 at 05:49
  • @MarcGravell Sir i am facing the same problem .Have you any idea how to convert to equivalent string representation in java. – Deepak Aug 12 '15 at 04:17

3 Answers3

21

You can decode with protoc:

protoc --decode_raw < msg.bin
To1ne
  • 1,108
  • 2
  • 12
  • 22
7
UnknownFieldSet.parseFrom(msg).toString()

This will show you the top level fields. Unfortunately it can't know the exact details of field types. long/int/bool/enum etc are all encoded as Varint and all look the same. Strings, byte-arrays and sub-messages are length-delimited and are also indistinguishable.

Some useful details here: https://github.com/dcodeIO/protobuf.js/wiki/How-to-reverse-engineer-a-buffer-by-hand

If you follow the code in the UnknownFieldSet.mergeFrom() you'll see how you could try decode sub-messages and falling back to strings if that fails - but it's not going to be very reliable.

There are 2 spare values for the wiretype in the protocol - it would have been really helpful if google had used one of these to denote sub-messages. (And the other for null values perhaps.)

Here's some very crude rushed code which attempts to produce a something useful for diagnostics. It guesses at the data types and in the case of strings and sub-messages it will print both alternatives in some cases. Please don't trust any values it prints:

public static String decodeProto(byte[] data, boolean singleLine) throws IOException {
    return decodeProto(ByteString.copyFrom(data), 0, singleLine);
}

public static String decodeProto(ByteString data, int depth, boolean singleLine) throws IOException {
    final CodedInputStream input = CodedInputStream.newInstance(data.asReadOnlyByteBuffer());
    return decodeProtoInput(input, depth, singleLine);
}

private static String decodeProtoInput(CodedInputStream input, int depth, boolean singleLine) throws IOException {
    StringBuilder s = new StringBuilder("{ ");
    boolean foundFields = false;
    while (true) {
        final int tag = input.readTag();
        int type = WireFormat.getTagWireType(tag);
        if (tag == 0 || type == WireFormat.WIRETYPE_END_GROUP) {
            break;
        }
        foundFields = true;
        protoNewline(depth, s, singleLine);

        final int number = WireFormat.getTagFieldNumber(tag);
        s.append(number).append(": ");

        switch (type) {
            case WireFormat.WIRETYPE_VARINT:
                s.append(input.readInt64());
                break;
            case WireFormat.WIRETYPE_FIXED64:
                s.append(Double.longBitsToDouble(input.readFixed64()));
                break;
            case WireFormat.WIRETYPE_LENGTH_DELIMITED:
                ByteString data = input.readBytes();
                try {
                    String submessage = decodeProto(data, depth + 1, singleLine);
                    if (data.size() < 30) {
                        boolean probablyString = true;
                        String str = new String(data.toByteArray(), Charsets.UTF_8);
                        for (char c : str.toCharArray()) {
                            if (c < '\n') {
                                probablyString = false;
                                break;
                            }
                        }
                        if (probablyString) {
                            s.append("\"").append(str).append("\" ");
                        }
                    }
                    s.append(submessage);
                } catch (IOException e) {
                    s.append('"').append(new String(data.toByteArray())).append('"');
                }
            break;
            case WireFormat.WIRETYPE_START_GROUP:
                s.append(decodeProtoInput(input, depth + 1, singleLine));
                break;
            case WireFormat.WIRETYPE_FIXED32:
                s.append(Float.intBitsToFloat(input.readFixed32()));
                break;
            default:
                throw new InvalidProtocolBufferException("Invalid wire type");
        }

    }
    if (foundFields) {
        protoNewline(depth - 1, s, singleLine);
    }
    return s.append('}').toString();
}

private static void protoNewline(int depth, StringBuilder s, boolean noNewline) {
    if (noNewline) {
        s.append(" ");
        return;
    }
    s.append('\n');
    for (int i = 0; i <= depth; i++) {
        s.append(INDENT);
    }
}
AutomatedMike
  • 1,454
  • 13
  • 27
4

I'm going to assume the real question is how to decode protobufs and not how to read binary from the wire using Java.

The answer to your question can be found here

Briefly, on the wire, protobufs are encoded as 3-tuples of <key,type,value>, where:

  • the key is the field number assigned to the field in the .proto schema
  • the type is one of <Varint, int32, length-delimited, start-group, end-group,int64. It contains just enough information to decode the value of the 3-tuple, namely it tells you how long the value is.
blackgreen
  • 34,072
  • 23
  • 111
  • 129
tgoodhart
  • 3,111
  • 26
  • 37