11

Everywhere I look, I see the following piece of code:

struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr.s_addr = ip;

In C++, the same idea is usually expressed as

sockaddr_in addr = {}; // unneccesary(?) value-initialzation
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr.s_addr = ip;

Yet nowhere I look (in terms of official documentation) do I see any requirement to zero out the structure before setting those members! Yes, usually BSD sockets implementations do have a sin_zero member defined for sockaddr_in, but they always say the member is needed for padding, to align the size of sockaddr_in with sockaddr. And they never request one to put any specific contents into it.

Is there any real, documentation proven need to zero out the struct?

P.S. Before you VTC the question as a duplicate of one of several SO questions regarding memset on sockaddr_in, please make sure the question your are suggesting as a duplicate has any links to official documentation rather than just speculation on 'initializing of unused members just in case'.

user207421
  • 305,947
  • 44
  • 307
  • 483
SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • Note: unbalanced parentheses: `memset(&addr, 0, sizeof(addr);` -->> `memset(&addr, 0, sizeof addr);` – joop Mar 18 '16 at 14:28
  • 2
    I removed the [tag:language-lawyer] tag since it makes no sense in this context. – edmz Mar 18 '16 at 14:28
  • 2
    @black, it does make sense, and this is what the question is about. Here is the tag description: "For questions about the intricacies of formal or authoritative specifications of programming languages and **environments**". I am asking for authoritive specification. – SergeyA Mar 18 '16 at 14:29
  • 1
    @SergeyA I shall say I was not aware of the 'environment' part. But it is very misleading. Furthermore, all the questions I've seen with such tag were about language formalities: it is, after all, _language_ lawyer, not _environment_ lawyer also because languages are independent from the environment, let alone formal specifications. Guess a Q on meta is worth. – edmz Mar 18 '16 at 14:40
  • @black, we do not have 'environment-lawyer', so I've choosen formally matching tag. If you do ask a meta question, I'd love to see the link. – SergeyA Mar 18 '16 at 14:42
  • 1
    It is a cargo cult, like casing the return value of `malloc`. There seems to be no formal need to zero the padding. But there are anecdotal reports that failure do to so might lead to bad consequences in APIs that internally reinterpret `sockaddr_in` as `sockaddr`. – AnT stands with Russia Mar 18 '16 at 14:46
  • BTW: you dont *need* the extra pair of parentheses. `memset(&addr, 0, sizeof addr);` is correct, since `addr` is not a typename. (the won't do anay harm exept for the human reader, comparable to the memset() thing, which most probably is cargo-cult) – joop Mar 18 '16 at 14:47
  • 3
    They've finally stopped using `bzero`? –  Mar 18 '16 at 14:47
  • @AnT, I share the same belief. Yet I've seen 200K+ StackOverflow posters (not mentioning any names here! ;) who trully believe it is neccesary. I'd like to have a formal closure on this. – SergeyA Mar 18 '16 at 14:47
  • @joop, that I know. I just love those parenthises, they please me aestethically. – SergeyA Mar 18 '16 at 14:48
  • @WumpusQ.Wumbley, well, some of them still do, "they certainly do" (C). – SergeyA Mar 18 '16 at 14:48
  • 1
    They confuse me; I always think they are actually needed, while they are not. Just like the memset(), it is cargocult and defensive programming. – joop Mar 18 '16 at 14:51
  • @joop, nah, they are aestethical thing for me. Like spacing between function name and opening parenthis, space before/after `&` and `*`, same line or next line for `{`. You can use whatever you want in your code, but please don't police other people's code. – SergeyA Mar 18 '16 at 14:52
  • @SergeyA I opened a question on Meta, you may want to check it out [here](https://meta.stackoverflow.com/questions/319310/what-are-environments-in-the-language-lawyer-tag). Thanks for bringing up the point. – edmz Mar 18 '16 at 15:27
  • About the *// unneccesary(?) value-initialzation* comment: http://stackoverflow.com/a/1069634/613130 – xanatos Mar 18 '16 at 15:44
  • @xanatos, I didn't quite get what you are saying here. – SergeyA Mar 18 '16 at 15:50
  • @SergeyA The comment in the code seems to imply that `sockaddr_in addr = {}` is equivalent to `sockaddr_in addr;`, so that in C++ both will zero all the fields. The link I provided says the opposite, because `sockaddr_in` doesn't have a constructor that initializes the field, so `sockaddr_in addr;` leave its fields with unspecified value. – xanatos Mar 19 '16 at 10:30
  • @xanatos, no, it doesn't imply that. It implies that the code performs value initialization, which is not necessary. It is equivalent of C memset. – SergeyA Mar 19 '16 at 12:57

2 Answers2

7

Short answer:

The IEEE Standard doesn't require it.

But, I think it's best to be safe and zero out everything.


Long(er) answer:

The IEEE Standard 1003.1 specifies that the definition of sockaddr_in is (Emphasis mine):

The <netinet/in.h> header shall define the sockaddr_in structure that includes at least the following members:

sa_family_t sin_family AF_INET.

in_port_t sin_port Port number.

struct in_addr sin_addr IP address.

Note that, unlike the definition for sockaddr_in6, which specifies that it should be zeroed out:

The sockaddr_in6 structure shall be set to zero by an application prior to using it, since implementations are free to have additional, implementation-defined fields in sockaddr_in6.

There is no similar wording for sockaddr_in. However, such lack of wording gives platform implementors enough ambiguity to make their own requirements about zeroing out all or part of sockaddr_in.

Note that the definition of sockaddr_in used to require a sin_zero field to pad out the structure to make it compatible with sockaddr structures:

The sin_zero member was removed from the sockaddr_in structure as per The Open Group Base Resolution bwg2001-004.

And it's with sin_zero we find a discrepancy between Windows and Linux. Even though the field was removed from the official definition, both Windows and Linux implementation still include it (because it's not explicitly illegal thanks to the wording "at least").

Whether sin_zero always requires zeroing out or not for Windows platforms isn't clear, but in this blog post the writer did some digging and came up with the following:

On some architectures, it wont cause any problems not clearing sin_zero. But on other architectures it might. Its required by specification to clear sin_zero, so you must do this if you intend your code to be bug free for now and in the future.

For the part about "It's required by specification to clear sin_zero", I could only find the following Windows documentation (for Winsock Kernel) appearing to support the claim:

A WSK application should set the contents of this array to zero.

However, I can find no similar wording for Linux.

So to conclude, it appears that in some architectures you need to zero out at least one field, while in others you don't. I think it's best to be safe and zero out everything.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
AndyG
  • 39,700
  • 8
  • 109
  • 143
  • This doesn't answer my question. First, it says moot points about being a non-C/C++ standard (everybody knows that), than it goes into lengthy **incorrect** details about Linux sockaddr_in implementations (it **surely** has a padding member to conform to the size of sockaddr structure) and lastly talks about Windows Kernel Sockets application, of which my question is not. – SergeyA Mar 18 '16 at 15:48
  • The discussion on architecture specific implementations seems relevant because of the non-standardness. Apologies if I got any detail swrong about the linux impl. I dont intend to mislead, and I thought the source was reliable. – AndyG Mar 18 '16 at 16:06
  • @SergeyA: I did a little more digging and updated the post with what I hope is better sourcing. – AndyG Mar 18 '16 at 18:42
  • I am sorry, but this is not the answer at all. It is a lengthy statement that you don't know the answer. – SergeyA Mar 18 '16 at 18:47
  • @SergeyA: What I understood was that your question was asking for official documentation, and in this post I show that the official documentation makes no requirement. Would you prefer that I remove all the background information (or just show the relevant excerpt from the standard)? I thought the rest was informative, but I could understand an argument against a wall of text. – AndyG Mar 18 '16 at 18:50
  • just squeeze it a bit. The note about IEEE standard not requiring it is a good one. The mentioning of AF_INET6 zeroing requirements is also a good one (not my question, but a good reference). Remove details on `struct sockaddr`, the question is not about it. Do not repeat yourself. And than it would be an answer worth accepting! :) – SergeyA Mar 18 '16 at 18:54
  • @SergeyA: I've cut out a large amount of the discussion. Please let me know if I worded anything incorrectly or ambiguously. – AndyG Mar 18 '16 at 19:07
  • Ok, I do not agree with conclusion (it appears that on **neither** architecture you need to zero the field unless you are writing winsock kernel apps!) but conclusions are left to the readers anyway. I am accepting this answer (and retracting my downvote:) – SergeyA Mar 18 '16 at 19:12
  • @SergeyA: Thank you. You certainly made me work for it, and I appreciate the opportunity to improve! – AndyG Mar 18 '16 at 19:18
  • As of the 2018 version of the IEEE/Open Group specifications (https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/netinet_in.h.html) there is now a clarifying (but non-normative) "Application Usage" section, which confirms that for sockaddr_in (as opposed to sockaddr_in6) there is no requirement to zero-init, but that you are "encouraged" to do so. – Peter Maydell Aug 13 '21 at 15:42
0

A bit late to the party - but here's something relevant, that I think is a worthwhile addition to the discussion. In the 3rd edition of the “Unix Network Programming” of Richard Stevens, on page 70, there's this:

“...when binding non-wildcard IPv4 address, this member must be zero (pp. 731-732 of TCPv2)”

Stevens here mentions "TCPv2" - and the "member" he is talking about is sin_zero. This seems to fit with the profile of your question... at least in the context of bind-ing. Notice that he uses the word must.

Whether this answer qualifies for "official documentation" or not... I leave that up to you! But as others have said, regardless of whether it's official or not... it's for sure a good idea to memset to zero (or use = {0}; at structure variable definition). In this way, you have one less thing to worry about...

ttsiodras
  • 10,602
  • 6
  • 55
  • 71
  • Here "TCPv2" refers to Stevens' own book *TCP/IP Illustrated,* volume 2, not to any version 2 of the TCP protocol. And certainly neither of these books is official documentation, although extremely handy of course. – user207421 Apr 01 '22 at 05:54