5

So, my understanding is that the following code:

somestruct_t a = {0};
somestruct_t b;
b = a;

is always preferable, when possible, to:

somestruct_t a;
somestruct_t b;
memset(&a, 0, sizeof(a));
memcpy(&b, &a, sizeof(a));

And the top constructs are almost always possible...which leads me to my question: Since the top code both performs well, and is to me clearly more intuitive to someone learning the language, why are the memset and memcpy patterns so amazingly prevalent in C and even non-OO C++ code? Literally every project I've worked on for decades prefers the bottom pattern.

I'm assuming there is some historical reason for it, such as very old compilers not supporting it or somesuch, but I'd very much like to know the specific reason.

I know general history questions are off-topic, but this is about a very specific bad practice that I would like to understand better.

EDIT I am NOT trying to assert that memcpy and memset are bad in general. I'm talking about a very specific use pattern of assignment or initialization of a single structure.

zzxyz
  • 2,953
  • 1
  • 16
  • 31
  • 3
    It looks to me that your initial premise (*why are the memset and memcpy patterns so amazingly prevalent in C...*) is just not true. – Eugene Sh. Jul 13 '18 at 18:03
  • 6
    Try allocating a 64MB array on the stack – William Pursell Jul 13 '18 at 18:03
  • 6
    Also good to keep in mind that the former syntax is (I believe) part of C99, and a lot of code has been written for compilers before C99. – vcsjones Jul 13 '18 at 18:04
  • Microsoft and other code examples for Win32 / WinAPI can date back to the early 1990s, so anyone copying them from MSDN, etc. today will still use memset( sizeof( foo ) ) in that copied code. – Dave S Jul 13 '18 at 18:08
  • 4
    `somestruct_t somestruct = {};` --> "error: ISO C forbids empty initializer braces [-Wpedantic]". Use `somestruct_t somestruct = { 0 };` – chux - Reinstate Monica Jul 13 '18 at 18:09
  • If that's correct about C99, that answers my question. That's much more recent than I was led to believe. – zzxyz Jul 13 '18 at 18:10
  • 2
    I think the pre-C99 answer is most of it. Also, you do something for 20 years, it's kinda hard to break the habit - especially when the people okaying new code are often experienced with coding from this time. – Michael Dorgan Jul 13 '18 at 18:11
  • `memset` and `memcpy` aren't normally used in object-oriented code. They're the mainstay of procedural C applications that don't have allocators like `new` or copy/move constructors. – tadman Jul 13 '18 at 18:14
  • Thanks @chux - that comment saved me some future trouble :) – zzxyz Jul 13 '18 at 18:20
  • Note that while you can use `calloc()` to zero-initialize dynamically allocated memory, it is relatively seldom used — certainly in the code I work with. If you want to initialize dynamically allocated memory with some other initialization than 'all zeros', you can't readily manage it with initializers or standard functions. – Jonathan Leffler Jul 13 '18 at 18:54
  • @tadman what does that comment mean in `C` tag? – SergeyA Jul 13 '18 at 19:01
  • I think you're confusing the use of pointers in C to pass in a pointer instead of a copy, with the copying of structures in. Both approaches are valid, it really just depends on the use-case. Do you want a copy or will a pointer suffice? – tadman Jul 13 '18 at 19:19
  • 2
    You can't assign an array to another array. – Jonathon Reinhart Jul 13 '18 at 22:17
  • I've edited the code in my question for (hopefully) greater clarity and a stronger example of where memset and memcpy are inappropriate imo. – zzxyz Jul 13 '18 at 22:21
  • @EugeneSh. - That's interesting, and I certainly may be off-base with that premise in certain areas. But I do mean literally every single project I've ever worked on (over 20). But it is fair to say I've worked with relatively little modern C. – zzxyz Jul 13 '18 at 22:33
  • 1
    even better `somestruct_t b = a;`, amazing no ? ;) – Stargateur Jul 13 '18 at 23:17
  • 1
    Once upon a very long time ago, structure assignment was not a part of C — read K&R 1st Edition from 1978 to see that documented (but with a forecast that it would be fixed before long; it was by the time I started using C five years later). I'm still cleaning up code with roots from 1982 that has structure assignment implemented by macro. (As in, when I'm not looking here, I'm fiddling with code that has such macros in place.) – Jonathan Leffler Jul 14 '18 at 00:02
  • See also my [comments](https://stackoverflow.com/questions/50808782/what-does-impossibility-to-return-arrays-actually-mean-in-c/50808867#comment88623336_50808782) on a loosely related question — quoting K&R 1st Edn accurately. (It took a while to find that, not least because it wasn't actually an answer.) – Jonathan Leffler Jul 14 '18 at 00:15

3 Answers3

4

It sounds like your experience is significantly different from mine, and from several of the other commentators here.

I don't know anyone who prefers

memcpy(&a, &b, sizeof(a));

over

a = b;

In my programming world (and in just about any world I can imagine), simple assignment is vastly preferable to memcpy. memcpy is for moving chunks of arbitrary data around (analogous to strcpy, but when it's arbitrary bytes instead of null-terminated strings). It's hard to imagine why anyone would advocate using memcpy instead of struct assignment. Naturally there are individual programmers everywhere who have gotten into various bad habits, so I guess I can't be too surprised if there are some who prefer the opposite, but I have to say, I would generally disagree with what they're doing.

Someone speculated in the comments that there was perhaps some historical precedent at work, but at least for the memcpy-versus-assignment questions, I can state with some certainty that this is not the case.

Once upon a time, before there was C90 memcpy, there was BSD bcopy, but before there was bcopy there wasn't a standard function for doing an efficient copy of a bunch of bytes from point a to point b. But there was struct assignment, which really has been in the language almost from the beginning. And struct assignment typically uses a nice, tight, compiler-generated byte-copying loop. So there was a time when it was fashionable to do something like this:

#define bcpy(a, b, n) (*(struct {char x[n];} *)a = *(struct {char x[n];} *)b)

I may have gotten the syntax wrong, but this hijacks the compiler's ability to do efficient struct assignment, and repurposes it to copy n bytes from arbitrary pointer b to arbitrary pointer a, i.e. just like bcopy or memcpy.

In other words, it's not like memcpy came first, followed by struct assignment -- it was actually exactly the opposite!

Now, memset versus struct initialization is a different story.

Most of the "clean" ways of zeroing a struct are initializations, but of course it's not uncommon to want to set a struct to all zero at some point later than when it was defined. It's also not uncommon to have a dynamically-allocated struct, and using malloc/realloc rather than calloc. So in those cases, memset is attractive. I think modern C has struct constants you can use at any time, but I'm guessing I'm not the only one who still hasn't learned them and so is still tending to use memset instead.

So I wouldn't consider using memset to be poor style, not in the same way as memcpy is poor style for struct assignment.

Although I have seen, and written, code that did something like

struct s zerostruct = { 0 };

and then later

a = zerostruct;

as a "better style" alternative to

memset(&a, 0, sizeof(a));   

Bottom line: I wouldn't agree that memcpy is recommended over struct assignment, and I am critical of anyone who prefers it. But memset is quite useful (and not disrecommended) for zeroing structures, because the alternatives aren't nearly as compelling.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Yeah, and I've worked in multiple OS's, hardware platforms, and three different industries. Worth noting that I've had very few opportunities to meet the people writing this code. One guy thought `memcpy` was faster, and another didn't even realize assignment was possible. And personally I've sometimes found myself following the pattern of existing code and doing it myself while certainly not preferring it. – zzxyz Jul 13 '18 at 23:04
  • 1
    @zzxyz Sorry if it seemed like I was arguing with you, or criticizing you. I do realize that there are 'more things in heaven and earth' and all that... – Steve Summit Jul 13 '18 at 23:10
  • Oh no, I didn't take it that way at all, but thanks! I was just kinda underlining that I doubt my experience is entirely unique or super limited. Perhaps unusual, though :) – zzxyz Jul 13 '18 at 23:17
  • "I don't know why anyone would advocate using memcpy instead of struct assignment", when you use FAM you have to use memcpy, but that a rare case. – Stargateur Jul 13 '18 at 23:18
  • @Stargateur Ah, good point. So I've softened that language. (And for anyone who's wondering what "FAM" is, as I did at first, it's a [flexible array member](https://en.wikipedia.org/wiki/Flexible_array_member).) – Steve Summit Jul 13 '18 at 23:22
  • @SteveSummit - also a clarification on your "Bottom line". I haven't really ever received that recommendation...in fact the opposite. Which was part of what made me curious about why *so much* code I inherit seems to use `memcpy`. (and I get further confused by inheriting a lot of `C` code that's been superficially ported to `C++`) – zzxyz Jul 13 '18 at 23:29
  • Note that `(*(struct {char x[n];} *)a = *(struct {char x[n];} *)b)` breaks strict aliasing rules in general. – Pascal Cuoq Jul 14 '18 at 10:11
  • @PascalCuoq That code fragment breaks all kinds of rules! But as I hope you realize,it hasn't been written since about 1982, *long* before the concept of strict aliasing had been invented, in an era when people performed all kinds of clever tricks based on the code they knew the compiler was going to emit. – Steve Summit Jul 14 '18 at 10:27
  • Accepting this answer because after removing a bunch of memsets and memcpy in some C++ code (superficially ported C code), I had to put the memsets back. The VS compiler targeting Windows CE, and our somewhat old clang targeting Android both had issues . While they may have been resolvable, it was easier just to go back to memset. – zzxyz Jul 16 '18 at 18:32
  • And gcc targeting Linux refused to compile until I set C++11. I would really like to also tag C++ in this question, but people seem to flip out when the languages are cross-tagged, which I sort of get. In any event, I suspect I would've run into similar (but different) problems with the C compilers. – zzxyz Jul 16 '18 at 18:40
3

There is exactly one use case where

struct somestruct  foo = { 0 };

is not sufficient, and

struct somestruct  foo;
memset(&foo, 0, sizeof foo);

needs to be used instead: when the padding in the structure may be important.

You see, the only difference between the two is that the latter is quaranteed to clear structure padding to zero too, whereas the former is only quaranteed to clear structure members to zero.

The reason one might care about padding, is based on upwards/future compatibility. If padding is quaranteed to be zero in current programs, a future version of a library can "reuse" the padding for new data fields, and still work with older binaries.

Since C99, new C libraries do, and should, reserve some members for just that purpose explicitly. That's usually why you see "reserved" fields in structures defined by many libraries, and even in the Linux kernel-userspace interface. So, the padding issue is really only relevant to structures developed before C99 support became widespread; in other words, in old libraries only.

The one structure I know should always be cleared using memset(), is struct sigaction, defined in POSIX.1. In most POSIXy systems it is a perfectly normal structure (and so code that just clears the members of the structure will work absolutely fine on those systems), but because of the various different implementations at different times (especially how the signal mask is implemented), I believe there are still systems with C libraries that have a version of the structure where clearing the padding is still important.

(This is because of how the sa_handler and sa_sigaction members are usually in an union, and/or because the definition of sigset_t may have changed.)

It is possible there are others in some other older libraries, so I would recommend using the memset() idiom when working with libraries with pre-1999 roots, whose example code also uses it.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
2

Well, it's not true that struct assignment and "zero-initialization" is always preferable to memset() and memcpy(). It may just as well be that a compiler does a better job optimizing memset/cpy() (having "special knowledge" of these two standard library functions), for either size or speed. Granted, would be a little strange, but, possible.

Also, since "zero initialization" doesn't work on heap-allocated structs, there's something to be said for consistency of always using memset().

Similar goes for situations when you might want to copy a few adjacent structs (part of an array) - again, if you always use memcpy(), code is more consistent.

On a historic note, I've worked with toolchains where local struct initialization was broken. On projects that went through such toolchains, the "always use memset()" will prevail even after such toolchains are dismissed. The thing is, even if your toolchain's memset() is broken, you can make your own, but you can't make your own local struct initialization...

srdjan.veljkovic
  • 2,468
  • 16
  • 24
  • 1
    "Also, since "zero initialization" doesn't work on heap-allocated structs, there's something to be said for consistency of always using memset().", that where calloc is use for. – Stargateur Jul 13 '18 at 23:20
  • @Stargateur Sure, `calloc()` is an option, but I didn't want to burden the answer introducing `calloc()`. That's because it's not always the "thing to do" - for example, you may initialize to "all zeros" only in some cases, so you `malloc()` and then `if` to see how to initialize, rather than `if` and then either `malloc()` and some intialization or/else `calloc()`. – srdjan.veljkovic Jul 13 '18 at 23:26
  • 1
    `if condition then malloc else calloc` but don't bother was just a note. – Stargateur Jul 13 '18 at 23:35
  • The possibility of `dst = src` when `*dst = *src` is meant when dealing with pointers (regardless of whether they are stack or heap pointers) occurs to me, too. Perhaps because I just accidentally did that. :) Of course, memcpy is hardly safe, but you would have to work at making that particular mistake. Whereas with assignment it's a simple typo. – zzxyz Jul 14 '18 at 00:33