Sending null byte over a socket to a Windows Machine from Linux - Will it be same

Question

This is based on an earlier question I asked . Currently I am sending an octet of zero bits to linux from a linux machine as such over a socket

const char null_data(0);
send(newsockfd,&null_data,1,0);

My question is will this be the same when sending to a windows machine (64 bits) ? or will I have to change the code ?

`sizeof(char)` is not guaranteed to always equal 1. `char` can be multibyte. — scooter me fecit, Jan 06 '16 at 00:23
It is. If it's 4 octets then multiply the return of sizeof by 4 to have the size in octets. — mikedu95, Jan 06 '16 at 00:26
@ScottM C++ Standard guarantees the size of `char` to be 1 byte even if a single byte is larger than 8 bites. — Captain Obvlious, Jan 06 '16 at 00:27
@CaptainObvlious: Compiler flags exist to assume multibyte characters, hence the need for `stdint.h` and `cstdint`. I'd have to search a bit, but I seem to recall a MSVC flag that aliases `char` to `wchar_t` (I'm working off memory here.) — scooter me fecit, Jan 06 '16 at 00:35
@ScottM: No, it has a type `TCHAR` that can be either 8 or 16 bits. Type `char` is always 8 bits under MSVC, and is always by definition 1 byte (`CHAR_BIT` bits) under any conforming C or C++ implementation. — Keith Thompson, Jan 06 '16 at 00:45
@KeithThompson: True for the WinXX platform that TCHAR is supposed to be used in place of `char` when MBCS support is turned on. Still safer for the OP to explicitly use `uint8_t` instead of `char`: it tells the reader exactly what's being sent and how large the quantity is. Reduces ambiguity. — scooter me fecit, Jan 06 '16 at 00:51
@ScottM: You are confusing character encoding with type sizes. `sizeof(char)` is equal to 1, regardless of compiler settings. A codepoint in Unicode can be encoded using multiple code units. — IInspectable, Jan 06 '16 at 00:51
@IInspectable: See the standard spec quote in the other answer. The standard doesn't guarantee that `sizeof(char) * 8 == 1 * 8`, just that it has to be sufficiently large. Now, as pointed out, it is generally universal that `sizeof(char) * 8 == 1 * 8`, but there's a lot of value added to being explicit with using `uint8_t`, where the octet guarantee holds. — scooter me fecit, Jan 06 '16 at 00:55
@ScottM: If `CHAR_BIT > 8`, then `uint8_t` will not exist. You can use that as a kind of compile-time assertion that `CHAR_BIT==8` -- though an explicit `#if` and `#error` might be clearer. Note also that `uint8_t` is unsigned, while `char` may be either signed or unsigned. — Keith Thompson, Jan 06 '16 at 00:56
@ScottM: `sizeof(char)` is **defined** to be equal to 1. Each and every conforming C++ implementation has to go by that. This implies, that `sizeof(char) * 8 == 1 * 8`. Always. You are talking about something different, but it is not clear, what you are talking about. — IInspectable, Jan 06 '16 at 00:58
@KeithThompson: Looking through header files, quite a few compilers define `uint8_t` as a compiler internal data type and not necessarily as `unsigned char` (which is probably what the original OP code should have used instead of `char`.) It can be compiler implementation dependent. — scooter me fecit, Jan 06 '16 at 01:02
See [this code monk](https://drj11.wordpress.com/2007/04/08/sizeofchar-is-1/) post for an example discussion of when `sizeof(char) == 1`, but the byte size is 16 on the the TI C54x DSP. — scooter me fecit, Jan 06 '16 at 01:05
@ScottM Those compiler flags have dick to do with the Standard, I suggest you read it. You can argue about it all you want but until the Standard is changes your perception about the size of `char` is wrong. — Captain Obvlious, Jan 06 '16 at 01:13
@CaptainObvlious: Agreed that compiler flags have little to do with the standard, but then again, the standard doesn't define a byte as 8 bits either. `CHAR_BITS` doesn't have to equal 8 and I've worked on platforms where it isn't (retro PDP-8 platforms and custom embedded devices by obscure contractors.) — scooter me fecit, Jan 06 '16 at 01:17
It doesn't matter if a byte is 1 bit or 1000 bites it's still _one_ byte and `sizeof` returns the size of a type in....what for it......waiiiiit for.....bytes!. Feel free to continue grasping at straws until you find a hair to split. — Captain Obvlious, Jan 06 '16 at 01:20
@CaptainObvlious: There's a difference between a standard-conforming and a nonstandard compiler. Seriously. — scooter me fecit, Jan 06 '16 at 01:22
@ScottM: MSVC is (and always has been) standard-conforming in this respect. Seriously, give it a rest. You were wrong. No big deal, we all make mistakes. But really, trying that desperately to make the wrong sound right is very annoying. — IInspectable, Jan 06 '16 at 01:26
@IInspectable: My point has been and remains that `sizeof(char)` is not an octet on all platforms and that I've run into non-portable environment where `sizeof(char) != 1` despite what the standard says. Plan and simple. Hence the caution on assuming that everything is standards compliant. Read code monk's post for a reasonable summary. — scooter me fecit, Jan 06 '16 at 01:32
@ScottM: It's certainly true that `char` is not necessarily an octet (more precisely, that `CHAR_BIT` is not necessarily equal to 8). It's commonly 16 or 32 on DSPs. As for your claim that there are non-conforming environments where `sizeof (char) != 1`, I am skeptical. We can agree that such an environment would be non-conforming, but I've never heard of such a non-conforming environment that actually exists. Can you cite a specific example? (Do you now agree that MSVC does not qualify?) The code monk post does *not* support this claim. — Keith Thompson, Jan 06 '16 at 01:35
@ScottM: Please do read code monk's post again, to understand what it is really saying. There is no mention of any platform, where `sizeof(char)` would be different from 1. The blog post contains numerous samples, where people take that to mean: *"a char is a byte is an octet"*. That's where things go wrong, and I presume, that's your understanding as well. I never claimed, that a `char` (or a byte in C++) would be exactly 8 bits, and yet you keep repeating, what I already know (and never publicly challenged). — IInspectable, Jan 06 '16 at 01:57

scooter me fecit · Accepted Answer · 2016-01-06T04:47:16.997

The trick here is to use the uint*_t data types, insofar as feasible:

#include <cstdint>

/* ... */

#if !defined(__WIN64)
// *nix variant
typedef int socket_fd_t;
#else
// WinXX socket descriptor data type.
typedef SOCKET socket_fd_t;
#endif

void send_0_byte(socket_fd_t newsockfd)
{
    uint8_t zero_byte(0);
    send(newsockfd, &zero_byte, 1, 0);
}

You probably want to add some error checking code, include the correct platform socket header. uint8_t is an 8-bit quantity (octet) by definition, which meets your requirement and avoids potential char size issues.

On the receiver side, you want to recv into a uint8_t buffer.

Christophe · Answer 2 · 2016-01-06T00:37:00.917

-1

You send 1 char. The char is defined in the standard to be 1 byte in size. Fortunately, on all the C++ implementations I know, the byte size is 1 octet (8 bits), so that you should always get the same result.

Note however that the standard does not define the size of a byte:

1.7/1 The fundamental storage unit in the C ++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation- defined.

This means that all the sending/receiving machines/architectures/implementation do not necessarily have the same understanding of the number of octets to be sent/received. For example, if in the future some implementation would for example define a byte to be represented by 2 octets (perfectly valid according to the standard, although not probable), you could in theory get some troubles.

The real problems will start if you use larger integers, as you'll have to cope with potentially different endianness. Even worse if you consider floating point data as the encoding is not specified by the standard.

edited Jan 06 '16 at 00:37

answered Jan 06 '16 at 00:17

Christophe

68,716
7
72
138

2

`sizeof(char)` is **defined** to be 1. For all standards conforming C++ implementations. – IInspectable Jan 06 '16 at 00:18
1

`sizeof(char)` is always 1. If you are talking about the size of char in bits, it's `CHAR_BIT` and it's defined in `` and it is always >= 8. – mikedu95 Jan 06 '16 at 00:22
@Inspectable yes, sorry, I wanted to highlight that a char was not necessarily an octet. I'll correct. – Christophe Jan 06 '16 at 00:22
@Christophe While it's *technically* possible for `char` to be more than 8 bits (but not fewer), [there are very few platforms where that's the case](http://stackoverflow.com/questions/2098149/). Unless you're working with certain unusual embedded platforms, it's not something you're likely to ever encounter. – Jan 06 '16 at 00:28
@duskwuff I've edited m wording. THat's exacly my point, in principle it should work, however, strictly taking the standard's definition there is a risk. personally I think that no implementation nowadays will dare to use a larger byte size, because of the millions of lines of code that currently assume that a char is 8 bit and that it's the common deniominator betwen different platforms. – Christophe Jan 06 '16 at 00:41
1

What risk are you talking about? The standard says only that a `char` is exactly 1 byte, which is `CHAR_BIT` bits, where `CHAR_BIT >= 8`. I know of no C or C++ implementation that violates these requirements. – Keith Thompson Jan 06 '16 at 00:48
1

@KeithThompson: The risk is, that a byte is not sufficiently well defined (at least 8 bits). When transferring data across machine boundaries, you are crossing platforms, that may have different byte sizes (in theory). Using a fixed size integer (e.g. `uint8_t`) is unambiguous, and should be used, when a protocol relies on exact sizes. – IInspectable Jan 06 '16 at 00:55
Even if program A uses 35 bits for a char, the send function will care about the 8 first only because the unit of a TCP/IP message is **8** bits. Then on the program B side that receives the message, no matter if char is of size 14 bits, since it's >= 8 bits so it can handle any TCP/IP char. – mikedu95 Jan 06 '16 at 00:59
@KeithThompson Unfortuantely, the definition of byte in the standard is not specific enough. I agree that it's not probable to be different from 8 bits, but you have no guarantee that it might not change. As a similar anecdote: a very long time ago, most C compilers used unsigned chars and many pepople assumed chars had to be always unsigned. THis is why today you have weird things like [`isalpha()`](http://en.cppreference.com/w/cpp/string/byte/isalpha) which is defined with a signed int as parameter that is required to be between 0 and 255. – Christophe Jan 06 '16 at 01:03
@mikedu95 that's an interesting remark, and one reason why implementations will not swith so soon to 35 bits ;-) But if they would, swith to 32 bits they could perfectly send 4x8bits and transform the 1 byte length into 4 octets at OS level. – Christophe Jan 06 '16 at 01:10
1

@mikedu95: Those comments are interesting. And wrong. `send` is implemented by the platform, and it can choose to map a byte in C++ to the required number of bits to be handed off to the lower layers for transfer. Whether a byte has to be split across several message units or not is an implementation detail. – IInspectable Jan 06 '16 at 01:22

Sending null byte over a socket to a Windows Machine from Linux - Will it be same

2 Answers2