When using google protocol buffers to transfer String character,got messy code

Question

In debug view:

Here is the code which encodes into messy string...

((S2CEnterCollection)objS2c).toByteString().toStringUtf8();

Output:

    ���"default(
    ���"default(
    ���"default(
    ���"default(
    ���"default(
    ����"default(
    ����"default(
    �����"default(

Here is the code which has the right string:

((S2CEnterCollection)objS2c).toString()

The original string was:

    cardList {
      cardId: 100001
      liked: 100
      number: 10
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100002
      liked: 123
      number: 10
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100003
      liked: 543
      number: 10
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100004
      liked: 766
      number: 10
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100005
      liked: 78
      number: 10
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100006
      liked: 89
      number: 123
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100007
      liked: 199
      number: 567
      finder: "default"
      rank: 1
    }
    cardList {
      cardId: 100008
      liked: 90909
      number: 232
      finder: "default"
      rank: 1
    }

So, does anyone know how it works?

Hi Ryan, you might try adding in the code you used that generated this. Also, what character encoding are you using? — jamesmortensen, Jan 07 '13 at 04:20
hi, @jmort253, i was using utf-8 encoding which is the default. And I tried to code like : new String(((S2CEnterCollection)objS2c).toString().getBytes(Charset.forName("utf-8"))); which worked well and gave the expected result.. But as u can see,this way didn't include any protocol buffers framework and it seemed like i got straight and back of data transferring which actually just return the string of the object as ((S2CEnterCollection)objS2c).toString()... — Ryan Zhu, Jan 07 '13 at 04:40
sorry for a mistake,actually i didn't use Chinese, just english... but still got messy data... — Ryan Zhu, Jan 07 '13 at 04:52
Then you should edit your question title to edit that out. On Stack Overflow, nothing you post is immutable. — jamesmortensen, Jan 07 '13 at 04:54

score 4 · Accepted Answer · answered Jan 07 '13 at 08:11

4

protobuf data is binary, and is not encoded text. You cannot run it through an encoding like UTF-8 and expect to get a string (or expect it to still be valid). The only way to convert protobuf data to a string is to run it through a base-N encode for some N, typically 64 (because it is well-supported on most platforms).

answered Jan 07 '13 at 08:11

Marc Gravell

1,026,079
266
2,566
2,900

1

hey,thanks for answering,but what do u mean for "run it through a base-N encode for some N,typically 64"? – Ryan Zhu Jan 07 '13 at 08:18
1

@RyanZhu let's assume, for simplicity, that N=64; [then see base-64 on wiki](http://en.wikipedia.org/wiki/Base64). I'm not a java person, but in .NET that would be just: `string s = Convert.ToBase64String(binaryData);` or `var binaryData = Convert.FromBase64String(s);` – Marc Gravell Jan 07 '13 at 08:19
ohh,yeah! that's right! I use base64 and it really works perfectly!thanks a lot! – Ryan Zhu Jan 07 '13 at 10:19

James · Answer 2 · 2013-01-07T05:15:59.353

That messy string is likely absolutely correct. The problem is: you're assuming it's a human readable string, and it's not. toByteString(), and I quote:

Serializes the message to a ByteString and returns it. This is just a trivial wrapper around writeTo(CodedOutputStream).

https://developers.google.com/protocol-buffers/docs/reference/java/index - look for MessageLite.

It's the sort of format that you might use to transmit across a network, or something you might store in a file with millions of records. It's not meant to be human readable - it's meant to be a relatively small, machine readable representation. So it does things like use tag identifiers (small numbers) rather than field names, variable length encoding, and various other tricks to minimize size at the expense of readability.

https://developers.google.com/protocol-buffers/docs/encoding

hi, @James ,so which means what i got was not a wrong message... but given the requirements of our program, the client,aka ios side, intends to get a string which contains a list of binary of the given data and turns out what they get,as messy code,was unreadable and can't be decode into right data as given above...so is that some existing ways by which i can use to decode the messy code right into the expected text? which means i wanna try to encode it and then decode as a test...thanks in advance. — Ryan Zhu, Jan 07 '13 at 05:48

score 1 · Answer 3 · answered Jan 08 '13 at 18:19

I prefer to use Google's own com.google.protobuf.TextFormat class which constructs a human readable representation of the Protobuf object's contents with it's "print" methods. In the example below, PayloadContent can be any Message:

PayloadContent pc = PayloadContent.newBuilder().setContent........build();
String text = TextFormat.shortDebugString(pc);

If you want however to see the "Byte" format, then surely convert the ByteString representation to Base64 - but this is not much use for a human to read :)

oh!get it! thought I have got several ways to solve this problem,now we change the return type to bytestring and so we don't have to create a string object...thanks anyway — Ryan Zhu, Jan 09 '13 at 05:47

When using google protocol buffers to transfer String character,got messy code

3 Answers3

Linked