3

I'm learning HTTP protocol following a tutorial which gives an understandable piece of code and here's part of it.

struct sockaddr_in address;
...
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons( PORT );

memset(address.sin_zero, '\0', sizeof address.sin_zero);


if (bind(server_fd, (struct sockaddr *)&address, sizeof(address))<0)
{
    perror("In bind");
    exit(EXIT_FAILURE);
}

The example code works well, although I don't understand the some kind of transfer between two structs.

the definition of struct sockaddr_in in <netinet/in.h> is

struct sockaddr_in {
    __uint8_t   sin_len;
    sa_family_t sin_family;
    in_port_t   sin_port;
    struct  in_addr sin_addr;
    char        sin_zero[8];
};

the definition of struct sockaddr in <sys/socket.h> is

struct sockaddr {
    __uint8_t   sa_len;     /* total length */
    sa_family_t sa_family;  /* [XSI] address family */
    char        sa_data[14];    /* [XSI] addr value (actually larger) */
};

They have different structures, how the "transfer/casting" works there?

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
JJJohn
  • 915
  • 8
  • 26
  • 1
    You must keep in mind that this interface was created at a time where there was no `void *` type in C to act as a generic pointer. – Gerhardh Nov 18 '21 at 10:39
  • @Gerhardh Thank you. Does "this interface" refer to `struct sockaddr_in`, `struct sockaddr` or something else? – JJJohn Nov 18 '21 at 10:55
  • I mean the whole socket interface with all the corresponding types and functions. – Gerhardh Nov 18 '21 at 10:56
  • The code presented does not involve any casting between these two structure types. What is cast is a *pointer* (to a different *pointer* type). Such pointer casts are explicitly allowed. Accessing the pointed-to object via the resulting pointer is a different question. – John Bollinger Nov 18 '21 at 14:57

3 Answers3

1

The casting works. Looking at the two structures:

struct sockaddr_in {
    __uint8_t   sin_len;
    sa_family_t sin_family;
    in_port_t   sin_port;
    struct in_addr sin_addr;
    char        sin_zero[8];
};

struct sockaddr {
    __uint8_t   sa_len;     /* total length */
    sa_family_t sa_family;  /* [XSI] address family */
    char        sa_data[14];    /* [XSI] addr value (actually larger) */
};

First two members, sin_len and sa_len, sin_family and sa_family will not be problematic as those are of the same data type. The padding for sa_family_t works exactly the same on both ends. Looking at the reference,

in_port_t Equivalent to the type uint16_t as described in <inttypes.h>
in_addr_t Equivalent to the type uint32_t as described in <inttypes.h>

For windows, struct in_addr looks like below:

struct in_addr {
    union {
        struct {
            u_char s_b1;
            u_char s_b2;
            u_char s_b3;
            u_char s_b4;
        } S_un_b;
        struct {
            u_short s_w1;
            u_short s_w2;
        } S_un_w;
        u_long S_addr;
    } S_un;
};

and that for a linux is:

struct in_addr {
   uint32_t s_addr;     /* address in network byte order */
};

The whole confusion you might have is because of how the contents align. However, it is a well-thought historic design. It is intended to accommodate implementation-dependent aspects in the design. When I Secondly, implementation-dependent -- it refers to the fact that implementation of in_addr_t is not consistent across all systems, as seen above.

In a nutshell, this entire magic works, because of the 2 things: The exact size and padding nature of the first two members and then lastly the data type of sa_data[14] is char, or more precisely an array of a 1-byte data-type. This design trick with union inside a struct has been widely used.

Unix Network Programming Volume 1 states:

The reason the sin_addr member is a structure, and not just an in_addr_t, is historical. Earlier releases (4.2BSD) defined the in_addr structure as a union of various structures, to allow access to each of the 4 bytes and to both of the 16-bit values contained within the 32-bit IPv4 address. This was used with class A, B, and C addresses to fetch the appropriate bytes of the address. But with the advent of subnetting and then the disappearance of the various address classes with classless addressing, the need for the union disappeared. Most systems today have done away with the union and just define in_addr as a structure with a single in_addr_t member.

Not what you asked for, but good to know:

The same header states:

The sockaddr_in structure is used to store addresses for the Internet address family. Values of this type shall be cast by applications to struct sockaddr for use with socket functions.

So, sockaddr_in is a struct specific to IP-based communication and sockaddr is more of a generic structure for socket operations.

Just a try:

#include <stdio.h>
#include <sys/socket.h>
#include <netinet/in.h>

int main(void)
{
    printf("sizeof(struct sockaddr_in) = %zu bytes\n", sizeof(struct sockaddr_in));
    printf("sizeof(struct sockaddr) = %zu bytes\n", sizeof(struct sockaddr));
    return 0;
}

Prints:

sizeof(struct sockaddr_in) = 16 bytes
sizeof(struct sockaddr) = 16 bytes
WedaPashi
  • 3,561
  • 26
  • 42
  • Thank you so much. Does "padding" here means all the fields after the field `sin_family/sa_family`? – JJJohn Nov 18 '21 at 11:18
  • Not really.. When the term padding is used w.r.t structure members, it should be interpreted as padding that structure member to natural address boundaries. So, not just 'all' fields, but address locations after the `sa_family` until the natural boundary, which is *usually* 4 bytes. – WedaPashi Nov 18 '21 at 11:49
  • This all might be a little complicated to grasp if you aren't already aware of structure padding concept in general. [This might help](https://stackoverflow.com/questions/4306186/structure-padding-and-packing). – WedaPashi Nov 18 '21 at 11:52
  • How can you be sure that the compiler is not aligning `sin_addr`on 32 bits boundary ? if it does both structures are not the same size. – Guillaume Petitjean Nov 18 '21 at 13:04
1

I don't understand the some kind of transfer between two structs.

There is no data transfer between different structs, nor any conversion of structure objects. In bind(server_fd, (struct sockaddr *)&address, sizeof(address)), a pointer to a struct is converted to a different object pointer type. This is explicitly allowed by C.

The C language specification does not define any behavior for accessing the struct via the converted pointer. Any attempt to do so would violate the strict aliasing rule, but that's not your problem. The example you presented demonstrates an utterly standard usage idiom for the bind() function, for which it was designed. Therefore, you can rely on the bind() implementation to do the right thing with it, by whatever magic is required.

Conceptually, though, you can observe that the first two members of struct sockaddr and struct sockaddr_in have the same data types. You could imagine, then, that bind is able to access those two members via the converted pointer, despite it constituting a strict-aliasing violation. Although C does not define behavior for that, POSIX implicitly requires that it work in at least this case. Having then done that, the second of those members indicates the address family, by which bind() can invoke the appropriate behavior for the address's actual type.

That is a variation on C-style polymorphism. It is helped out by the third bind argument, the size of the address object, which enables bind() to copy the address object without knowing its true effective data type.

These structure types and the bind() API could have been defined a bit differently to avoid the implied strict-aliasing violation, but that wasn't necessary in early C, where member names corresponded directly to offsets from the beginning of the structure. And where those names were global, which is why you see the sin_ and sa_ prefixes in those member names, and similar in many other structure types provided by the system. Nowadays, it's best to just accept that that's how bind() is used, and it's up to the system to provide a bind() implementation that accommodates it.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

I think this cast breaks the strict aliasing rule and then is undefined behaviour if the bind function dereferences the pointer.

In practice the code assumes that all fields of struct sockaddr_in are contiguous so you can access a buffer of bytes either as a struct sockaddr_in or as a struct sockaddr equivalently. But the fields of a structure are not guaranteed to be contiguous. If in_port_tis two bytes long for example, there may very well be a hole between sin_portand sin_addr with a 32 bytes machine compiler because it may want to align sin_addr field on 32 bytes address.

This way of coding is frequent when you develop a communication interface driver: you receive a buffer of bytes that need to be interpreted as a data structure (like: first byte is an adress, following bytes are a length, etc...). Casting from a structure to another one avoids to copy data.

Note that usually compilers provide non-standard-C ways to guarantee that all fields of structures are contigiuous. For example with gcc it is __attribute__((packed))

Now, to answer to your question: provided the structures are packed and there is no undefined behaviour, the cast basically does nothing. sa_data will be the array of bytes located after the field sin_family. So this array will consist of sin_port, followed by sin_addr followed by the array sin_zero.

EDIT: I compiled tje following structures on STM32H7 (ARM cortex M7, 32 bits architecture) with arm-none-eabi-gcc:

struct in_addr {
    uint32_t s_addr;
};
struct sockaddr_in {
    uint8_t sin_len;
    uint16_t sin_family;
    uint16_t sin_port;
    struct in_addr sin_addr;
    char     sin_zero[8];
};
struct sockaddr {
    uint8_t sa_len;
    uint16_t sa_family;
    char     sin_zero[14];
};

The size of sockaddr_in is 20.

The size of sockaddr is 18.

Note that if sa_family_t is of type char and not short, due to alignment, both structures are same size.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Guillaume Petitjean
  • 2,408
  • 1
  • 21
  • 47
  • 3
    It does not matter if there is padding inside the struct. The first 2 fields have the same type. Therefore the padding will be the same. And the first 2 fields are sufficient to determine what kind of structure is required to access the whole data. – Gerhardh Nov 18 '21 at 10:30
  • there might be padding between `sin_port`, `sin_addr` and `sin_zero`, in this case the code would not work. – Guillaume Petitjean Nov 18 '21 at 10:32
  • As I mentioned, the padding will be the same for both structs as long as the first 2 members have the same type. The code will work. As you can see, the structs are not packed in the OP's code. Only these first 2 fields need to be read before the called function can use the correct type to aceess the other fields. – Gerhardh Nov 18 '21 at 10:34
  • As `sa_family_t` likely is a `short`, there is a good chance, that there really is some padding in the structs and it obviously works. – Gerhardh Nov 18 '21 at 10:36
  • "As I mentioned, the padding will be the same for both structs as long as the first 2 members have the same type"; Don't understand. What about the next 3 fields ? – Guillaume Petitjean Nov 18 '21 at 10:41
  • The padding after `sa_family` does not matter indeed. – Guillaume Petitjean Nov 18 '21 at 10:42
  • I've never said the code was not working, actually I often implemented this kind of stuff. But is is not portable, not MISRA-C compliant (if it matters) and must be used with care – Guillaume Petitjean Nov 18 '21 at 10:43
  • You mentioned "in this case the code would not work." As these fields will only be accessed using the correct struct type, it will work. Of course you will get a hard time running socket interface through a MISRA or HIS checker. – Gerhardh Nov 18 '21 at 10:46
  • I might be missing something but I'm still convinced that both structures are not equivalent since there might be some padding between `sin_port` and `sin_addr` whereas `sa_data` assumes there is not. Of course it may have no impact depending on what the `bind` function is doing with the pointer. – Guillaume Petitjean Nov 18 '21 at 13:11