3

Say I want to send 4 byte integer over network. The integer has fixed size, due to using types from stdint. My question is: Does it matter if I try to send either signed or unsigned integer using these 4 bytes? (assuming I use same method to serialize/deserialize the integer to/from bytes, both on client and server side). Can there be some other problems? (I don't refer to endianness issues either)

  • 1
    signed'ness is just interpretation. – Mitch Wheat Jul 12 '14 at 01:00
  • @MitchWheat: I don't know I got confused somewhere reading such comment: "Signed integers are OK too, unless your machine doesn't use a two's complement representation.". (particularly here: http://stackoverflow.com/questions/8000851/passing-a-struct-over-tcp-sock-stream-socket-in-c) –  Jul 12 '14 at 01:01
  • There is only one possible problem: Different interpretation by sender and receiver. See [endianness](https://stackoverflow.com/tags/endianness/info) as well as, because you ask about signed numbers, ones-complement, twos-complement and sign-and-magnitude. – Deduplicator Jul 12 '14 at 01:07

3 Answers3

4

This issue seldom gets the attention it deserves.

As Floris observes, only the bytes of the representation get sent. C and C++ define the bitwise representation* of unsigned numbers, but not signed ones, so sending signed numbers as bytes opens a compatibility gap.

It's easy to "fix" the format for transmission. Casting a signed int to its corresponding unsigned type is guaranteed to generate two's complement representation. But how to convert back? Casting an unsigned integer to its signed counterpart generates signed integer overflow when you want a negative number, which produces an unspecified result — you could get anything.

To be really safe, use a branch:

signed int deserialize_sint( unsigned int nonnegative ) {
    if ( nonnegative < INT_MAX ) return nonnegative;
    else return - (int) ( - nonnegative ); // Only cast an unsigned number < INT_MAX
}

With luck, the compiler will see that both cases are the same and eliminate the branch.

The above function is written in C; apologies to the C++ crowd.

If you want to be extra paranoid, you could check - nonnegative < INT_MAX before performing the cast, because the most negative number in a two's complement will still overflow a one's complement machine. The best you can do for the case of nonnegative == - nonnegative is to return a wider type, or if that's impossible, flag a runtime error.

* Endianness becomes ambiguous when the bits are divided into a byte sequence, though.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • I'm having trouble understanding this. When does `-(int)(-nonnegative)` give a different result than `(int)nonnegative`? – Nairou May 24 '20 at 17:09
  • 1
    @Nairou The cast to int is defined to preserve numeric value. Therefore is can never yield a well-defined negative value from a positive value. The double negation avoids overflow on numeric conversion. – Potatoswatter May 24 '20 at 17:41
  • I know this is being done because "signed overflow" is undefined, but does this mean that "unsigned overflow" and "unsigned to signed cast" (or "signed underflow due to cast from unsigned") are both defined? – Nairou May 24 '20 at 18:56
  • 1
    @Nairou Right, there's no such thing as unsigned overflow, because the `-` operator is defined to do two's-complement negation yielding a smaller positive value. Then the unsigned to signed cast is fine because the positive value is small enough to be in an `int`. – Potatoswatter May 24 '20 at 19:08
2

When you send a number over a socket, it's just bytes.

Now if you want to send a negative number, and the representation of negative numbers is different at the receiving end, then you might have a problem. Otherwise, it's just bytes.

So if there is a chance that the binary representation of the negative number would be misunderstood at the receiving end, then you need to do some translating (maybe send a sign byte followed by four magnitude bytes, and put it all together at the other end).

That's quite unlikely though.

Floris
  • 45,857
  • 6
  • 70
  • 122
  • @Floris: I see, that's it? In that sense, it is safer to use only unsigned ints right when sending over network? –  Jul 12 '14 at 01:09
  • No, it's necessary to send the bytes of a multi-byte value in a known order. There's functions to translate integer values to and from "network byte order" in C for this exact reason. – Ⴖuі Jul 12 '14 at 01:13
  • 1
    Use `htonl()` [and family](http://linux.die.net/man/3/htonl) to translate the integer to and from host specific representations. – Martin York Jul 12 '14 at 01:15
  • @LokiAstari: I was not refering to endiannes issues. More to issues related with sign; because this hasn't been mentioned in some well received answers such as this one: http://stackoverflow.com/questions/1577161/passing-a-structure-through-sockets-in-c –  Jul 12 '14 at 01:16
2

Because the standard does not mandate a particular representation for signed types:

3.9.1 Fundamental types [basic.fundamental] Paragraph 7 of n3936

Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system. [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]

Sending signed integer values in a binary representation is not well defined (unless you explicitly specify this as part of your protocol and do some manual work to make sure you know how to read/write that binary representation).

There are a couple of solutions depending on the exact requirements.

  • If speed is not primary concern then you could use an English (substitute language of your choice) representation and serialize integers to/from text. For a lot of problems this is not a bad solution as the major speed bump is not the serialization cost but network latency. Network latency is the major problem in most situations (but not always).
  • So alternatively if you need binary representation (because you timed it and the volume/density of your numbers requires it). Then the endianess problem is not hard to solve because of htonl() and family. Which covers all unsigned integral types (well at least 16/32 bit values).
    • So all you really need to solve is the representation of signed values. So pick one (Use the most common representation for the machines you use and the translation will then usually be a no-op). But if you know the on the wire representation (because it is specified in your protocol), then you can translate to/from this representation (usually this cost is small (a conditional addition)) on machines that do not natively support this representation.
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • Ok so in addition to tracking: size, and endianness - one must also be careful with sign issues? I did not know the later. It seems using unsigned int is more safe in this respect? –  Jul 12 '14 at 01:50
  • @dmcr_code yes I think that pretty much sums it up. – Floris Jul 12 '14 at 02:31
  • @Floris: so if one uses unsigned int, this sign issues don't arise anymore? –  Jul 12 '14 at 10:59
  • 1
    @dmcr_code : yes the representation of unsigned integers is fixed by the standard so as long as both sides agree on endianness and size you avoid the sign problem when there _is_ no sign... – Floris Jul 12 '14 at 12:16