What is the data type of content sent over socket?

Question

When using Berkeley socket api, what is the data type of content that is sent over the read/send or write/recv calls? For example -

char *msg = "Our Message!";
int len, bytes_sent;
len = strlen(msg);
bytes_sent = send(sockfd, msg, len, 0);

in this code, we are using char type, but are we limited to just char type since send/write/sendto usually take void * type. I've also seen arguments like if we send some int, it might actually be stored in little endian/big endian causing problems b/w source/dest if their endianess don't match. Then why doesn't char type suffers from this problem too?

Also different languages like C and C++ have different size of char too, then why isn't this a problem? If socket doesn't care any type and just sees the content as buffer, why don't we see random corruption of data when different tcp servers/clients are written in different languages and communicate with each other?

In short, what values(type) can I send safely through sockets?

"Also different languages like C and C++ have different size of char" No, in both C and C++ the size of a char is 1. — , Mar 04 '17 at 23:24
Safest way you can go is to send a buffer of bytes and `char` is always 1 byte. All languages designed for this purpose have such a data type. — DeiDei, Mar 04 '17 at 23:26
C and C++ are different languages. Clarify your quesiton and state your **specific** problem. Then pick a language and remove the unrelated tag. — too honest for this site, Mar 04 '17 at 23:26
@Olaf both languages are at concern here since I've code in c++ but socket api is in C. — hg_git, Mar 04 '17 at 23:27
@NeilButterworth http://stackoverflow.com/questions/2172943/size-of-character-a-in-c-c — hg_git, Mar 04 '17 at 23:29
The API is irrelevant. An object file has no source code and using a library function with C ABI does not justify the C tag! — too honest for this site, Mar 04 '17 at 23:30
OK, that's the size of a character literal (it's a misfeature of C) - the size of the characters in a string like "foobar", which is probably what you are interested in transmitting, is 1 in both languages. — , Mar 04 '17 at 23:31
@hg_git: Read the question and accepted answer **carefully** (and understand them). A character literal is not a `char` in C! — too honest for this site, Mar 04 '17 at 23:32
@NeilButterworth: It is a legacy and one reason there should be only one language tagged typically. — too honest for this site, Mar 04 '17 at 23:33
@Olaf This is obviously C code, unless you think the Berkley socket library is written in C++, and if anything it's the C++ tag that should be removed. In fact, both tags are perfectly fine, so please stop removing the C tag. — , Mar 04 '17 at 23:43
Maybe we should just tag this language agnostic and walk away. Anything you put on a wire in any language has to fit a mutually agreed upon protocol or Crom only knows what's going to happen. — user4581301, Mar 04 '17 at 23:53
@NeilButterworth 1) Wrong: character constants have type `int` in C, thus they can have different size than `char`. 2) OP stated he uses C++, so the code is C++, not C. 3) If the library you use would be relevant **every** C++ question would justify the C tag, because it eventually calls C code somewhere. 4) A library has no source code, but follows an ABI. Please stop adding irrelevant tags. — too honest for this site, Mar 05 '17 at 00:50
@user4581301: I'm fine with that; TCP provides an octet stream (which is not the even the same as the C or C++ `char` types. Problem is it is not clear what OPs problem is and he seems to have lost interest providing additional information. — too honest for this site, Mar 05 '17 at 00:53

score 5 · Accepted Answer · edited Mar 05 '17 at 00:34

You cannot safely send anything through a raw socket and expect the receiver to make sense of it. For example, the sending process might be on a machine where the character encoding is EBCDIC, and the receiving process might be on a machine where the character encoding was ASCII. It's up to the processes to either negotiate a protocol to sort this out, or to simply say in their specifications "We are using ASCII (or whatever)".

Once you have got the character encodings worked out, transmit the data in text is my advice. This avoids all endian problems, and is easier to debug and log.

score 4 · Answer 2 · answered Mar 05 '17 at 00:34

The simplest answer is that the data is an uninterpreted stream of octets, that is to say 8-bit bytes. Any interepretation of it is done by the sender and receiver, and they better agree. You certainly need to take both the size and endianness of integers into account, and compiler alignment and padding rules too. This is why for example you should not use C structs as network protocols.

What is the data type of content sent over socket?

2 Answers2