43

I heard a saying that c++ programmers should avoid memset,

class ArrInit {
    //! int a[1024] = { 0 };
    int a[1024];
public:
    ArrInit() {  memset(a, 0, 1024 * sizeof(int)); }
};

so considering the code above,if you do not use memset,how could you make a[1..1024] filled with zero?Whats wrong with memset in C++?

thanks.

Jichao
  • 40,341
  • 47
  • 125
  • 198
  • 5
    Can you give the reason as to why you think one should not do memset in C++? I don't know why doing memset should lead to any problem in C++. Please correct me if I am wrong. Thanks! – Jay Dec 29 '09 at 17:55
  • He probably heard it in the context of "don't use memset to zero-out class objects". – David R Tribble Dec 29 '09 at 18:48
  • 2
    @Jay: They above is OK. But using memset to zero the class object itself (not just a single member) is not a good idea. This is especially problomatic if the object contains members that have constructors (that do some initialization). – Martin York Dec 29 '09 at 19:01
  • 1
    BTW it's a[0..1023], not a[1..1024]. – user192472 Jun 14 '10 at 13:46
  • I would recommend against using a C style array, and instead use a vector. In which case you could then replace your constructor with ArrInit() : a( 1024, 0 ) {}, which would remove the memset and make your class arguably more "C++" in style. – Kit10 May 23 '12 at 15:10
  • Using memset directly will cause issues (as detailed below), however you could use an apporach like this: http://stackoverflow.com/a/38103250/3223828 – Stuart Gillibrand Jun 29 '16 at 15:20

11 Answers11

52

In C++ std::fill or std::fill_n may be a better choice, because it is generic and therefore can operate on objects as well as PODs. However, memset operates on a raw sequence of bytes, and should therefore never be used to initialize non-PODs. Regardless, optimized implementations of std::fill may internally use specialization to call memset if the type is a POD.

Charles Salvia
  • 52,325
  • 13
  • 128
  • 140
  • 1
    I forgot about std::fill so +1 to this from me. Yes, there is a c++ function specifically designed to fill containers so use it! – jcoder Dec 29 '09 at 18:19
51

The issue is not so much using memset() on the built-in types, it is using them on class (aka non-POD) types. Doing so will almost always do the wrong thing and frequently do the fatal thing - it may, for example, trample over a virtual function table pointer.

  • 7
    Using memset on any class with a virtual function is likely to be bad. – David Thornley Dec 29 '09 at 18:40
  • @Otto:because sizeof(class) would treat virtual function table pointer as one data member. – Jichao Dec 29 '09 at 18:50
  • Or on any class that contains a non-pod type, such as a string –  Dec 29 '09 at 18:51
  • 2
    `memset` is also problematic when used on some POD types, like pointers and floating point types. Setting all the bytes to 0 will not portably set pointers to NULL or floating point types to 0.0. – Adrian McCarthy Dec 29 '09 at 19:15
  • 2
    @toto: POD stands for "Plain Old Data". Essentially it refers to built-in types or structs or unions of built-in types. If you can declare it in C, it's probably a POD in C++. – Adrian McCarthy Dec 29 '09 at 19:16
  • POD means "plain old data" types without (non-trivial) constructors or destructors. –  Dec 29 '09 at 19:16
  • C++ already has a generic replacement for memset: `std::fill`. So yes, a C++ programmer should avoid memset. – jalf Dec 30 '09 at 01:39
  • so is virtual function pointer like an implicit member of a class? – theactiveactor Dec 30 '09 at 03:20
24

Zero-initializing should look like this:

class ArrInit {
    int a[1024];
public:
    ArrInit(): a() { }
};

As to using memset, there are a couple of ways to make the usage more robust (as with all such functions): avoid hard-coding the array's size and type:

memset(a, 0, sizeof(a));

For extra compile-time checks it is also possible to make sure that a indeed is an array (so sizeof(a) would make sense):

template <class T, size_t N>
size_t array_bytes(const T (&)[N])  //accepts only real arrays
{
    return sizeof(T) * N;
}

ArrInit() { memset(a, 0, array_bytes(a)); }

But for non-character types, I'd imagine the only value you'd use it to fill with is 0, and zero-initialization should already be available in one way or another.

UncleBens
  • 40,819
  • 6
  • 57
  • 90
  • what if want to initialize the array with non-zero? – Jichao Dec 29 '09 at 18:31
  • You can put any value you want inside the braces (e.g. ArrInit(): a() {5}) and it will initialize the array with that value. – Pace Dec 29 '09 at 18:40
  • 1
    You do realize that all I have to do is change `int` in your example to some class with a virtual function, and your code is likely to wipe out the vptr, don't you? You're explaining how to cause disasters in a slightly safer way. – David Thornley Dec 29 '09 at 18:45
  • 4
    @Pace: No, you'll get a syntax error. Those braces are the ones delimiting the body of the constructor function. Even with actual array initialization syntax: "int a[1024] = { 5 };" only the elements you list will be initialized, so in this example, only a[0] will be 5, not the entire array. – Dewayne Christensen Dec 29 '09 at 19:18
13

What's wrong with memset in C++ is mostly the same thing that's wrong with memset in C. memset fills memory region with physical zero-bit pattern, while in reality in virtually 100% of cases you need to fill an array with logical zero-values of corresponding type. In C language, memset is only guaranteed to properly initialize memory for integer types (and its validity for all integer types, as opposed to just char types, is a relatively recent guarantee added to C language specification). It is not guaranteed to properly set to zero any floating point values, it is not guaranteed to produce proper null-pointers.

Of course, the above might be seen as excessively pedantic, since the additional standards and conventions active on the given platform might (and most certainly will) extend the applicability of memset, but I would still suggest following the Occam's razor principle here: don't rely on any other standards and conventions unless you really really have to. C++ language (as well a C) offers several language-level features that let you safely initialize your aggregate objects with proper zero values of proper type. Other answers already mentioned these features.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    What is the difference between physical and logical zero? – Adil May 21 '13 at 12:13
  • @Adil Physical zero is the explicit actual "all-zeros" bit pattern in memory. Logical zero is [potentially non-zero] bit pattern that is interpreted as zero value of some type by the language (C or C++ in our case). – AnT stands with Russia Feb 10 '18 at 14:05
8

It is "bad" because you are not implementing your intent.

Your intent is to set each value in the array to zero and what you have programmed is setting an area of raw memory to zero. Yes, the two things have the same effect but it's clearer to simply write code to zero each element.

Also, it's likely no more efficient.

class ArrInit
{
public:
    ArrInit();
private:
    int a[1024];
};

ArrInit::ArrInit()
{
    for(int i = 0; i < 1024; ++i) {
        a[i] = 0;
    }
}


int main()
{
    ArrInit a;
}

Compiling this with visual c++ 2008 32 bit with optimisations turned on compiles the loop to -

; Line 12
    xor eax, eax
    mov ecx, 1024               ; 00000400H
    mov edi, edx
    rep stosd

Which is pretty much exactly what the memset would likely compile to anyway. But if you use memset there is no scope for the compiler to perform further optimisations, whereas by writing your intent it's possible that the compiler could perform further optimisations, for example noticing that each element is later set to something else before it is used so the initialisation can be optimised out, which it likely couldn't do nearly as easily if you had used memset.

jcoder
  • 29,554
  • 19
  • 87
  • 130
  • I understand of course that a default initializer will zero the array too, so this is just an example but the point stands, implement your requirements, which in this case is to set each array element to zero, rather than some other method to achieve the results unless it's the only way you can achieve other requirements such as performance – jcoder Dec 29 '09 at 18:16
  • 2
    `Which is pretty much exactly what the memset would likely compile to anyway.` Nope, memset can be much more complicated and efficient than a simple `rep stosd` – youfu Nov 07 '16 at 15:43
1

This is an OLD thread, but here's an interesting twist:

class myclass
{
  virtual void somefunc();
};

myclass onemyclass;

memset(&onemyclass,0,sizeof(myclass));

works PERFECTLY well!

However,

myclass *myptr;

myptr=&onemyclass;

memset(myptr,0,sizeof(myclass));

indeed sets the virtuals (i.e somefunc() above) to NULL.

Given that memset is drastically faster than setting to 0 each and every member in a large class, I've been doing the first memset above for ages and never had a problem.

So the really interesting question is how come it works? I suppose that the compiler actually starts to set the zero's BEYOND the virtual table... any idea?

  • 1
    "it doesn't crash or do anything obviously wrong that I could see" and "it works" are very much not the same thing. AFAICT both code snippets above are the same, but once you start invoking undefined behavior, all bets are off. Most likely a program that does either of the above will only (appear to) work under very specific circumstances, and will break badly in other circumstances (e.g. on a different compiler, or OS, or CPU architecture) – Jeremy Friesner Jan 04 '14 at 05:50
0

The short answer would be to use an std::vector with an initial size of 1024.

std::vector< int > a( 1024 ); // Uses the types default constructor, "T()".

The initial value of all elements of "a" would be 0, as the std::vector(size) constructor (as well as vector::resize) copies the value of the default constructor for all elements. For built-in types (a.k.a. intrinsic types, or PODs), you are guaranteed the initial value to be 0:

int x = int(); // x == 0

This would allow the type that "a" uses to change with minimal fuss, even to that of a class.

Most functions that take a void pointer (void*) as a parameter, such as memset, are not type safe. Ignoring an object's type, in this way, removes all C++ style semantics objects tend to rely on, such as construction, destruction and copying. memset makes assumptions about a class, which violates abstraction (not knowing or caring what is inside a class). While this violation isn't always immediately obvious, especially with intrinsic types, it can potentially lead to hard to locate bugs, especially as the code base grows and changes hands. If the type that is memset is a class with a vtable (virtual functions) it will also overwrite that data.

Kit10
  • 1,345
  • 11
  • 12
0

Your code is fine. I thought the only time in C++ where memset is dangerous is when you do something along the lines of:
YourClass instance; memset(&instance, 0, sizeof(YourClass);.

I believe it might zero out internal data in your instance that the compiler created.

rui
  • 11,015
  • 7
  • 46
  • 64
0

In addition to badness when applied to classes, memset is also error prone. It's very easy to get the arguments out-of-order, or to forget the sizeof portion. The code will usually compile with these errors, and quietly do the wrong thing. The symptom of the bug might not manifest until much later, making it difficult to track down.

memset is also problematic with lots of plain types, like pointers and floating point. Some programmers set all bytes to 0, assuming the pointers will then be NULL and floats will be 0.0. That's not a portable assumption.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • Setting pointers and floating-point numbers to binary zero usually works, but I wouldn't want to get into the habit. Still, the IEEE floating-point standard gets more and more entrenched, and that interprets all-bits-zero as 0.0. – David Thornley Dec 29 '09 at 22:59
  • @David: Yup, it usually works, but someday you'll be on a platform where it doesn't. – Adrian McCarthy Dec 30 '09 at 16:29
0

There's no real reason to not use it except for the few cases people pointed out that no one would use anyway, but there's no real benefit to using it either unless you are filling memguards or something.

-5

In C++ you should use new. In the case with simple arrays like in your example there is no real problem with using it. However, if you had an array of classes and used memset to initialize it, you woudn't be constructing the classes properly.

Consider this:

class A {
    int i;

    A() : i(5) {}
}

int main() {
    A a[10];
    memset (a, 0, 10 * sizeof (A));
}

The constructor for each of those elements will not be called, so the member variable i will not be set to 5. If you used new instead:

 A a = new A[10];

than each element in the array will have its constructor called and i will be set to 5.

Casey
  • 12,070
  • 18
  • 71
  • 107
  • I missed the question about initializing it to zero, and was focused on difference between memset and new. – Casey Dec 29 '09 at 18:01
  • 1
    @Casey:`A a[1]` in my g++ compiler does call the constructor,and the memeber variable i will be set to 5. – Jichao Dec 29 '09 at 18:26
  • 3
    `A a[10] = new A[10];` is not valid C++. You seem to be confusing C++ with another language. –  Dec 29 '09 at 19:35
  • Of course in `A a[10]`, the constructor of `A` is called for each of the ten instances created. This answer should be deleted. – YSC Oct 16 '18 at 15:32