2

I would like to send 1024 bytes of data using Thrift. It must be exactly 1024 bytes because it is a comparative benchmark with other frameworks.

Thrift has two types to represent bytes : 'byte' and 'binary', but I don't know how to use these types. 'binary' type is mapped to std::string which is quite strange (I don't understand why and how to use it). 'byte' type is mapped to a 8 bits integer which seems more logical to me.

To represent 1024 bytes of data, I use : list<byte> byteSequence with a size of 1024.

But a compile warning advises me to use binary instead of list<byte>, but why ? and how ?

I think I will get much better performance with 'binary' because it is strangely slow with a 1024 sequence of bytes.

Thank you.

Shastick
  • 1,218
  • 1
  • 12
  • 29
B. Clement
  • 298
  • 2
  • 13

2 Answers2

4

But a compile warning advises me to use binary instead of list, but why ? and how ?

'byte' type is mapped to a 8 bits integer which seems more logical to me.

And that's exactly why the warning is there. It seems logical but it is the worst choice. Furthermore, byte in Thrift is in fact an i8 - a signed type.

'binary' type is mapped to std::string which is quite strange (I don't understand why).

Don't worry. That's an historical thing. The binarytype was added later and implemented similar to string in some ways to reduce compatibility friction with older versions. That's just an impl detail.

but I don't know how to use these types.

Like any other type:

 struct wtf {
   1 : binary foo
   2 : string bar
   3 : byte baz     // i8 is replacing byte to indicate the signedness
   4 : list<byte>   // not recommended, but nevertheless works 
 }
Community
  • 1
  • 1
JensG
  • 13,148
  • 4
  • 45
  • 55
  • Thank you for your explanation. When I say "I don't know how to use it", I mean I don't know how to represent exactly 1024 bytes with a string. I've done it this way : string sequence ; for (int j=0; j<1024 ; j++) sequence+=(char)0; but I don't really know if it is a good way. – B. Clement Dec 02 '16 at 08:25
  • `std::string()` has some useful CTORs, see [this answer](http://stackoverflow.com/a/166646/499466). Something like putting e.g. 1024 times the letter`A` should do the trick. – JensG Dec 02 '16 at 16:27
2

It probably depends on the language you will be compiling your thrift files to, but binary tells thrift directly that you indeed want to transmit a sequence of raw, unencoded bytes.

It may not change things much at the transport layer in terms of size, but you may run into surprises when you instantiate/de-serialise the objects in your chosen language. In Java, for example, a binary field will be represented with a byte[] whereas list[byte] will give you a List[Byte], which is far less efficient to represent the same thing.

Java might be the only reason for binary, as according to the thrift doc:

binary: a sequence of unencoded bytes

N.B.: This is currently a specialized form of the string type above, added to provide better interoperability with Java. The current plan-of-record is to elevate this to a base type at some point.

Shastick
  • 1,218
  • 1
  • 12
  • 29
  • I am compiling with C++. If I understand well, there would not be performance difference between using list[byte] or binary in C++ ? I would like to try myself, but I don't understand how to use the binary type in C++, because it is mapped to std::string. How can I represent 1024 bytes of data with a std::string ? – B. Clement Dec 01 '16 at 08:11
  • 2
    I'm no C++ expert, but from googling a bit it seems that std::string is fine for holding and manipulating binary data. E.g: http://stackoverflow.com/a/837528/1997056 – Shastick Dec 01 '16 at 08:20
  • Ok so I try that : string sequence ; for (int j=0; j<1024 ; j++) sequence+=(char)0; And I get better performance. I hope this method is good. Thank you for your help. – B. Clement Dec 01 '16 at 08:42
  • 3
    That comment may need some revision. `binary` is a base type for a long time now. – JensG Dec 01 '16 at 15:06