4

In the C language a typical way to bind a Socket would be the following way:

int server_socket_fd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
int port_number = 55555;

addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(port_number);

int result = bind(server_socket_fd,(struct sockaddr *)&addr , sizeof(addr));
if(bind_result > 0)
{
    // Stuff
}

I am wondering why the cast from sockaddr_in to sockaddr works since I cant find any documentation why it works. It just seems like everyone just does it.

Why does the typecast work here?

I am not asking why we cast it, this has been answered here. I am asking why it works.

Moritz Schmidt
  • 2,635
  • 3
  • 27
  • 51
  • 2
    Possible duplicate of [Why do we cast sockaddr\_in to sockaddr when calling bind()?](https://stackoverflow.com/q/21099041/608639) and [Casting structure pointers between structs containing pointers to different types?](https://stackoverflow.com/q/27120486/608639) – jww Jul 11 '18 at 14:27
  • 2
    I personally use an union for this case. – Stargateur Jul 11 '18 at 20:19

4 Answers4

5

It is allowed to convert a struct pointer to a different struct pointer and back. This is detailed in section 6.3.2.3p7 of the C standard:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

There is a restriction in the above passage regarding alignment which is further detailed in section 6.2.5p28:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.48) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.

Presumably, the bind function knows what kind of socket descriptor it has and converts the struct sockaddr * back to a struct sockaddr_in *.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • "*Presumably, the `bind` function knows what kind of socket descriptor it has*" - yes, as the address family is part of the socket's internal info. `bind()` (and `connect()`) validates the passed in `sockaddr` matches the socket's address family. "*and converts the `struct sockaddr *` back to a `struct sockaddr_in *`*" - yes (for `AF_INET`) after checking that the passed `sockaddr` is the correct family AND byte size (which is why `bind()`, `connect()`, and `accept()` have a parameter to receive the size of the `sockaddr`). If a mismatch occurs, `bind()` (and `connect()` and `accept()`) fails. – Remy Lebeau Jul 12 '18 at 01:31
3

The sockaddr struct basically has only one field, the address family. The code that receives this structure can use this field to determine what is the actual type of the structure. All the structures that are really used also have this field as the first one and therefore the value is deterministic.

The implementations also make the structures the same size with padding, so the memory usage is also completely deterministic. This makes it work properly.

For example Microsoft defines the sockaddr structure in Visual Studio 2017 as

struct sockaddr {  
    unsigned short sa_family;  
    char sa_data[14];  
};

sa_data
Maximum size of all the different socket address structures.

So any “child” struct that may be sent must have 14 bytes of data in it, no more or less.

Whereas sockaddr_in is

struct sockaddr_in{  
    short sin_family;  
    unsigned short sin_port;  
struct in_addr sin_addr;  
    char sin_zero[8];  
};

Here the port and in_addr require six bytes in total so 8 bytes of padding is used to keep the size the same as sockaddr.

Of course it would be possible to create for example sockaddr_un, set its address family to claim it’s sockaddr_in and any code receiving the structure would cast it wrong and get completely wrong values.

Sami Kuhmonen
  • 30,146
  • 9
  • 61
  • 74
  • 3
    "*So any “child” struct that may be sent must have 14 bytes of data in it, no more or less*" - not true. `sockaddr` is defined the way it is to maintain size-compatibility with `sockaddr_in`, which is 16 bytes, and is what `sockaddr` *used* to be in the days when `AF_INET` was the only address family. But newer families are NOT limited to 14 bytes of data. For instance, `sockaddr_in6` has **26 bytes** (the `sin6_addr` field alone is **16 bytes**). The ONLY requirement is ALL `sockaddr_...` types MUST start with a 2-byte `family` member, whose value dictates the size of the remaining data. – Remy Lebeau Jul 12 '18 at 01:24
2

It works because the bind function uses only some of the first fields that apparently are the same in all the elements of the sockrt_addr struct family.

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • 1
    The only common field in all `sockaddr_...` structs is the 1st field - the address `family`. Based on that value alone, `bind()` can cast the `sockaddr*` pointer to a more appropriate pointer type (`sockaddr_in*`, `sockaddr_in6*`, `sockaddr_un*`, etc). Couple that with the `addrlen` parameter, which `bind()` uses to check that the `sockaddr*` points to a buffer that is large enough for `bind()` to safely access all of the `sockaddr_...` fields after performing the cast. – Remy Lebeau Jul 12 '18 at 01:35
2

The sockaddr type is intended for use on quality compilers that are suitable for low-level programming. On such compilers, casting the address of a structure to a pointer of a different structure type that shares a common initial sequence will yield a pointer that may be used to inspect members of that common initial sequence at least until one of the following has occurred:

  1. Code writes those fields via means other than the derived pointer.

  2. Code forms a pointer, via means other than the derived pointer, that will be used to write those fields. [A compiler might reorder a subsequent write via the latter pointer back to the point where it is created, even if the write "should" be after the code that inspects the CIS member, but a compiler can't reorder such a write unless it can prove that it will be performed].

  3. Code enters a loop wherein one of the above will occur.

  4. Code calls a function wherein one of the above will occur.

From the point of view of the Standard, support for constructs like the above is essentially a quality-of-implementation issue. The Rationale expressly recognizes the possibility that a "conforming" implementation might be of such poor quality as to be useless. Any allowance for compilers that are too primitive to handle even simple cases like those described above (e.g. gcc and clang) should be viewed in that light.

supercat
  • 77,689
  • 9
  • 166
  • 211