2

I have interest in socket programming, and i saw a couple of sources convert data to bytes before those lasts sent through network and when it received it converted back to it's first shape! Why so? is it better or faster to send bytes instead of strings or the data itself? I'm really confused about this as didn't find enough info on the web about it, or is that a networking thing, and if so i will be grateful if you link me to a subject explains in detail, sorry for my limited language as I'm not native, and thanks in advanced

2 Answers2

6

Sockets transfer raw sequences of bytes. They don't handle anything but that.

So, if you have a higher level construct (int, struct, string, etc.) you must "encode" them into a series of bytes that can be transmitted and then "decode" them on the other end.

David K. Hess
  • 16,632
  • 2
  • 49
  • 73
  • yeah that sounds understandable, didn't think sockets handles only with bytes, how lame i am :P , anyway thanks for help :) –  Aug 30 '14 at 02:36
  • so why Sockets transfer raw sequences of bytes instead of strings ? I don't know what's under the hood of sending dat from client side to server side, hope you could give some explanations – iMath Jul 19 '18 at 05:29
  • The closer you get to the network, the closer you are getting to hardware. To simplify the implementation of hardware, it typically works in bytes only. – David K. Hess Jul 19 '18 at 13:26
5

Data must be serialized into individual bytes because a socket (like a file handle) is a byte stream and a String is an Object. You can't put a round peg into a square hole. You can serialize a string into what looks like a string literal but in truth all strings have an encoding in memory or when streamed over the network or into a file. Usually strings sent over the network are encoded in UTF-8 which for ASCII characters will look like just a sequence of bytes representing each character. But any character with a byte value larger than 127 is going to actually use more than 1 byte to represent that one character. So everything must be serialized. Even integers have an encoding. A 4 btye integer could be serialized into a stream of 4 bytes with an order of least significant byte to most significant byte (this is called "little-endian") or in the opposite direction ("big-endian").

squarewav
  • 383
  • 2
  • 8
  • Oh! i understand now the reason behind converting and how necessary it is, thank you sir for helping –  Aug 30 '14 at 02:31
  • Please note, that by convention "*Big-Endian*" ought to be used as "*Network Byte Order*". For reference: http://en.wikipedia.org/wiki/Endianness and http://stackoverflow.com/a/13514942/694576 – alk Aug 30 '14 at 07:23
  • thanks a lot, i figured out the difference between big and little endian, but is one of them better than the other? and if so why? or it is only a byte ordering just like which side we have to crack the egg, the big side or the little one! –  Aug 30 '14 at 11:39
  • It only matters that both sides agree on the endianness of integers. Although if you are designing a protocol, I would use big-endian so that when you look at the raw hex (also known as "on-the-wire"), the byte sequence reads like a literal. Meaning a 4 byte encoding for the number 3 in big-endian would look like 00000003 on-the-wire whereas in little endian it would look like 03000000 which is harder to "see". – squarewav Sep 08 '14 at 18:30