3

1:

int a[100] = {};

2:

int a[100];
memset(a, 0, sizeof(a));

3:

int a[100];
fill(a, a + 100, 0);

What is the best way to zero a new array from the methods shown above and what is the difference between them?

Gaith
  • 798
  • 8
  • 19
  • possible duplicate of http://stackoverflow.com/questions/15719412/which-is-fastest-for-zeroing-out-an-array?rq=1 – phuclv Oct 11 '13 at 01:46
  • Is that really what you want to do, or do you want to zero an `int[]` that has been set from malloc? – Hot Licks Oct 11 '13 at 01:46
  • `std::vector a(100, 0)` – LihO Oct 11 '13 at 01:47
  • 1
    @mbratch If I'm not mistaken the first example will zero the array. – orlp Oct 11 '13 at 01:49
  • @LihO: I don't see why you should use a `std::vector` if you don't need a dynamic structure. The fact that he is working with C++ doesn't implicitly imply this neither you are supposed to use a vector under all circumstances. At most I'd say `std::array a{}`. – Jack Oct 11 '13 at 01:50
  • @Jack: What if he needs a dynamic structure and he just doesn't know it yet? I just wanted to point out that there's always another way. – LihO Oct 11 '13 at 01:51
  • 2
    @mbratch, I do not think you are right about option 3 either. – Adam Burry Oct 11 '13 at 01:56
  • @mbratch, 3 works perfectly, see [http://en.cppreference.com/w/cpp/algorithm/fill]. – Cramer Oct 11 '13 at 01:57
  • Sorry I misread #3. My mistake. – lurker Oct 11 '13 at 01:58

3 Answers3

4

1: The best. It sets all the values to their default value which for most is 0.

2: This is dangerous, it copies the pattern 0 through the whole array. For example if the array is of floats, there is no guarantee that it is represented as zeros. Also memset copies byte wise NOT word wise which can cause all sorts of issues if you pass it something other than zero. For example memset(a, 1, ...) will cause it to be filled with 16843009. Memset should not be used unless you are using C-strings.

3: Legal, and legible. Easy to extend to non-zero values while (1) will not. Although more verbose.

Cramer
  • 1,785
  • 1
  • 12
  • 20
  • 1
    it is legal to use `std::fill` in the way proposed by the OP. – Jack Oct 11 '13 at 01:52
  • You might remark on possible performance differences b/w 1 and 3. I'd expect 1 to have more chance of being faster. – Keith Oct 11 '13 at 01:59
  • @Keith, I was wondering that too, but it is hard to see how it could be. Those 100 ints on the stack have to be zeroed somehow, and marching over them seems to be the only choice. – Adam Burry Oct 11 '13 at 02:02
  • @AdamBurry I'd think that the compiler may be better optimized for that very standard case; but only profiling will tell. – Keith Oct 11 '13 at 04:48
  • @AdamBurry: You may think so, but the compiler can eliminate or at least delay writes. It's not a `volatile` array. – MSalters Oct 11 '13 at 08:04
2

Decided to investigate the performance question using VS2010, full optimization.

Interesting result:

1: 13105

2: 13044

3: 4546

No initialization case: 906.

So it looks highly like VS2010 uses memset for case 1, but that fill is better optimized.

#include "stdafx.h"
#include <Windows.h>
#include <algorithm>
#include <iostream>

int  fref()
{
    int a[1024];
    return a[512] - a[256];
}

int f1()
{
    int a[1024] = {};

    return a[512] - a[256];
}

int  f2()
{
    int a[1024];
    memset(a, 0, sizeof(a));
    return a[512] - a[256];
}



int f3()
{
    int a[1024];
    std::fill(a, a + 100, 0);
    return a[512] - a[256];
}

typedef int (*Function)();

LONGLONG time(Function function)
{
    const unsigned numLoops = 50000;
    LARGE_INTEGER start;
    QueryPerformanceCounter(&start);
    for(unsigned j = 1; j != numLoops; ++j)
    function();
    LARGE_INTEGER end;
    QueryPerformanceCounter(&end);
    return end.QuadPart-start.QuadPart;
}

Function tests[]= 
{
    &fref, &f1, &f2, &f3
};

const unsigned numTests = sizeof(tests)/sizeof(tests[0]);

LONGLONG results[numTests] = {};


int _tmain(int argc, _TCHAR* argv[])
{
    for(unsigned i = 0; i != numTests; ++i)
    {
        results[i] = time(tests[i]);
    }
    for(unsigned i = 0; i != numTests; ++i)
        std::cout << results[i] << std::endl;
    getchar();
    return 0;
}
Keith
  • 6,756
  • 19
  • 23
  • Why is this so? I would have thought memset would be the most low level way to do this (and hence by far more efficient). Very suspicious... – Sellorio Oct 11 '13 at 05:16
  • My guess is that `memset` is slow because it is assigning on a byte by byte basis while `fill` is filling on a word by word basis. Another reason you shouldn't use `memset` for anything apart from c-strings. – Cramer Oct 11 '13 at 05:49
  • `memset` doesn't know that its arguments are 32 bit aligned. It typically *will* use word-by-word writes, but needs special code for the start and end of the range. For a larger range, those costs will be relatively low, but `int[100]` isn't that big. – MSalters Oct 11 '13 at 08:01
1

Using Keith's sample code. The difference under GCC are as follows:

GCC 4.7.3: g++ -Wall -Wextra -std=c++0x -O3 -c array-fill.cpp

#include <algorithm>
#include <cstring>

int  fref() {
    int a[1024];
    return a[512] - a[256]; }

int f1() {
    int a[1024] = {};
    return a[512] - a[256]; }

int  f2() {
    int a[1024];
    std::memset(a, 0, sizeof(a));
    return a[512] - a[256]; }

int f3() {
    int a[1024];
    std::fill(a, a + 100, 0);
    return a[512] - a[256]; }

Disassemble

objdump -d array-fill.o | c++filt

00000000 <fref()>:
   0:   b8 00 10 00 00          mov    $0x1000,%eax
   5:   e8 00 00 00 00          call   a <fref()+0xa>
   a:   29 c4                   sub    %eax,%esp
   c:   8b 84 24 00 08 00 00    mov    0x800(%esp),%eax
  13:   2b 84 24 00 04 00 00    sub    0x400(%esp),%eax
  1a:   81 c4 00 10 00 00       add    $0x1000,%esp
  20:   c3                      ret
  21:   eb 0d                   jmp    30 <f1()>

00000030 <f1()>:
  30:   31 c0                   xor    %eax,%eax
  32:   c3                      ret
  33:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  39:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000040 <f2()>:
  40:   b8 00 10 00 00          mov    $0x1000,%eax
  45:   e8 00 00 00 00          call   4a <f2()+0xa>
  4a:   29 c4                   sub    %eax,%esp
  4c:   31 c0                   xor    %eax,%eax
  4e:   81 c4 00 10 00 00       add    $0x1000,%esp
  54:   c3                      ret
  55:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
  59:   8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi

00000060 <f3()>:
  60:   b8 00 10 00 00          mov    $0x1000,%eax
  65:   e8 00 00 00 00          call   6a <f3()+0xa>
  6a:   29 c4                   sub    %eax,%esp
  6c:   89 e0                   mov    %esp,%eax
  6e:   8d 94 24 90 01 00 00    lea    0x190(%esp),%edx
  75:   c7 00 00 00 00 00       movl   $0x0,(%eax)
  7b:   83 c0 04                add    $0x4,%eax
  7e:   39 d0                   cmp    %edx,%eax
  80:   75 f3                   jne    75 <f3()+0x15>
  82:   8b 84 24 00 08 00 00    mov    0x800(%esp),%eax
  89:   2b 84 24 00 04 00 00    sub    0x400(%esp),%eax
  90:   81 c4 00 10 00 00       add    $0x1000,%esp
  96:   c3                      ret

The C style initialization (f1) certainly allowed optimization in this case!

Adam Burry
  • 1,904
  • 13
  • 20