1

I use to think that the use of binary encoding is because every device has its way to interpret bytes. Thus if a router sends a bit as some significant information other router might treat this byte as a parity byte or something else... But isn't it all already covered in character encoding?? I mean character encoding tells what byte is representing which character, right? (Or am I missing something? ) Isn't the information about character encoding(like UTF-8) enough for devices to read bytes directly? If yes why would anyone want to encode this (Using something like base64) cause it will increase the size of the data required to be transferred.

Harshank Bansal
  • 2,798
  • 2
  • 7
  • 22
  • You can't encode binary data as utf-8. – tkausl Jan 23 '19 at 17:26
  • Trick question: How many bytes are there in a UTF-8 character? Bonus: how are types like "characters" stored on a computing device? How are they transferred? –  Jan 23 '19 at 18:10
  • Not sure I understand. Computer works with binaries, so we use often binary encoding (more efficient for computer). But we have a lot of text protocols (email [SMTP], web [HTTP], etc.) send text and expect text as answer. Modem worked in the same way: you sent AT commands. SQL is text (so one convert binary to text, send to database, and database convert back to binary. Now we use a lot JSON, also text. In past we exchanged data with csv. Different uses, but we still use both – Giacomo Catenazzi Jan 23 '19 at 19:16
  • @jdv the number of bytes can vary.. ASCII characters still take 1 byte and it can increase depending on what characters are being encoded.. Characters are stored in binary... And the 3rd question is what I am actually confused about.. Why can not these characters be transfered in bits and bytes after encoding using utf 8.. – Harshank Bansal Jan 24 '19 at 08:17
  • @GiacomoCatenazzi So to send data we convert binary to text and than this text is again converted in binary so that the machins can understand... – Harshank Bansal Jan 24 '19 at 10:15
  • Not always, but many protocols where done so: easy to debug, and more portable (and I can use TELNET to talk with SMTP, HTTP, IRC servers, to debug). Additionally, it is much more portable (and easy to program). [size of int, handling of negative numbers, etc.]. But there are full binary protocols (and some fileformat allow the two "endianes". Unfortunately most of programs can read only one. Sometime we do various text conversion. Unicode to UTF-8 (binary). This is encoded with BASE64 (so only 64 most common ASCII char), then in a MIME section, etc. – Giacomo Catenazzi Jan 24 '19 at 10:42
  • TCP IP addresses are send in every packet as binary (to be compact). But every machine should interpret them back (32bit number, with "network byte order" so big endian). Packets headers should be small (to be handled by old routers), and many packets. But protocol using TCP are often text. SO you see, there is no hard rule. It is up to the programmer to choose encoding, escaping, and protocol, thinking about efficiency, portability, and long term maintainability. – Giacomo Catenazzi Jan 24 '19 at 10:46
  • @GiacomoCatenazzi Got it now.. Thanks for the explanation.. Sorry I am not having any reputation points so can't upvote any comments... – Harshank Bansal Jan 24 '19 at 13:46
  • @HarshankBansal I think you missed my meaning, which is that character data is an abstraction on top of binary data. It isn't that ASCII is 1 byte and some other encoding else is 2 bytes. Think about my trick question a bit more. Byte order and byte run lengths are _both_ important when encoding _any_ data exchange format. And, at the end of the day, characters are basically data exchange. In short, I am also suggesting a subtly harder question: what _is_ "binary" data. The answer isn't as straightforward as often presented, and is highly contextual. –  Jan 24 '19 at 16:18
  • Aside: Unless name could be NULL, making both person and company NULL, consider that dtype is no longer needed. – Tom Blodget Jan 27 '19 at 22:46
  • @jdv Thanks for the explanation... Picture is much clearer in my head.. I still can't figure out why did u ask the binary question. I think binary is what computers understand.. Everything in computer is stored in binary. It can be said as abstraction of characters for machines – Harshank Bansal Feb 15 '19 at 21:17
  • @HarshankBansal we use the term "binary" as short-hand for a lot of things, even in terms of data storage. What do you mean by "computer"? The CPU? The data bus? control chips? The operating system? A magnetic drive? A program you've written? Each of these is, in some manner, an abstraction. And abstractions will have layers where they meet other abstractions. My point is that there is no concrete type or format called "binary", though we use this word for collections of words, or bytes, or electrical states as a sort of short hand. –  Feb 15 '19 at 21:32
  • @HarshankBansal It's a great question! And asked very well. I'm not sure why no one gave a proper answer to this. I too had the same question and wasn't getting the answer anywhere.

    I just found the answer over here: https://stackoverflow.com/a/201510/169513 I'm not sure whether you are still looking for the answer, but hope this helps!
    – Mugen Sep 02 '22 at 14:07

0 Answers0