Why struct in_addr is needed

Question

I already searched. Maybe bad... I didn't understand why the 'in_addr' structure exists.

typedef uint32_t in_addr_t;

struct in_addr
{
    in_addr_t s_addr;
};

The question is whether this structure is important or not. If it doesn't matter then I can write like this:

*(uint32_t*)(&s->sin_addr) = to_be32(ip->addr);

If it matters then I should write like this:

s->sin_addr.s_addr = to_be32(ip->addr);

On Windows it's a union with I think also the 4 individual segments of the addrss — user253751, Aug 20 '23 at 21:22
Note that `*(uint32_t*)(&s->sin_addr) = to_be32(ip->addr);` is likely to be a [strict aliasing violation](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) and if so it will invoke undefined behavior. — Andrew Henle, Aug 21 '23 at 22:39

Jeremy Friesner · Answer 1 · 2023-08-20T18:20:03.043

Since the address of the first member-variable in a struct is guaranteed to be the same as the address of the struct itself, and since the chances of that particular struct ever being modified are close to zero (since doing so would break backwards-compatibility with just about every already-compiled networking executable in the world), in this case it doesn't really matter.

In principle, however, you can imagine a scenario where someone inserts another member-variable at the top of struct in_addr, at which point your first example (with the casting) would break, while the second one would continue to work (after a recompile, of course).

As a general rule, avoiding explicit casts whenever possible is a good idea, because the casting tends to override the compiler's reporting of errors due to programming mistakes, leading to what could have been a compiler-error turning into a run-time failure that impacts users and/or takes a good amount of detective work to diagnose and fix.

As for why the in_addr struct is needed -- when they were designing the BSD sockets API, they were designing it to be able to support an arbitrary number of multiple networking protocols equally well, despite the fact that each networking protocol has its own addressing-scheme with different data to keep track of. The easiest way to do that (from a software-design perspective) is to give each protocol its own set of defined structs, and design the API calls to take a pointer-to-whatever-type-of-struct, along with a struct-size value, so that the same function calls can be used in conjunction with any supported protocol. So in_addr is the struct defined for the IPv4 protocol's idea of a network-address; the fact that it has only one member-variable in it is just a consequence of the (relative) simplicity of the IPv4 protocol's addressing scheme, but that fact wasn't considered worth breaking the struct-per-protocol abstraction over.

Very good answer, but long. +1 – Penguin Aug 20 '23 at 18:41 — Penguin, Aug 20 '23 at 18:41

Clifford · Accepted Answer · 2023-08-22T19:03:08.017

Yes, it matters, because it hides implementation details. Just because it contains a single in_addr_t in your case, does not mean that will be so on all systems. Moreover, in_addr_t need not even be compatible with uint32_t, as you have assumed.

The Socket API need not even be specific to TCP/IP - all these types are abstracted to allow your code to be portable across both network protocols and operating systems.

So, while it may work, there is no real advantage, and in any case I would argue that s->sin_addr.s_addr = to_be32(ip->addr); is simpler and easier to understand, and more importantly does not require you to know the underlying implementation. So, *(uint32_t*)(&s->sin_addr) = to_be32(ip->addr); may "work", but it is not correct, safe, or even serves any useful purpose. It is even a longer expression!

score 0 · Answer 3 · answered Aug 21 '23 at 22:18

While Linux systems define struct in_addr in this way, it's not necessarily like that in all cases.

In fact the definition of this struct on Windows (and as I recall older Solaris system, and possibly the original BSD implementation) is as follows:

struct in_addr {
        union {
                struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;
                struct { u_short s_w1,s_w2; } S_un_w;
                u_long S_addr;
        } S_un;
#define s_addr  S_un.S_addr
                                /* can be used for most tcp & ip code */
#define s_host  S_un.S_un_b.s_b2
                                /* host on imp */
#define s_net   S_un.S_un_b.s_b1
                                /* network */
#define s_imp   S_un.S_un_w.s_w2
                                /* imp */
#define s_impno S_un.S_un_b.s_b4
                                /* imp # */
#define s_lh    S_un.S_un_b.s_b3
                                /* logical host */
};

This gives access to specific bytes of the address, although such access is rarely useful.

While your alternate access would still work in this particular case, as the address of a struct is the same as that of its first member and the address of a union is the same as all of its members, you're still making an assumption about implementation details you shouldn't be making.

Why struct in_addr is needed

3 Answers3