What performance can I expect from std::fill_n(ptr, n, 0) relative to memset?

Question

For an iterator ptr which is a pointer, std::fill_n(ptr, n, 0) should do the same thing as memset(ptr, 0, n * sizeof(*ptr)) (but see @KeithThompson's comment on this answer).

For a C++ compiler in C++11/C++14/C++17 mode, under which conditions can I expect these to be compiled to the same code? And when/if they don't compile to the same code, is there a significant performance difference with -O0? -O3?

Note: Of course some/most of the answer might be compiler-specific. I'm only interested in one or two specific compilers, but please write about the compiler(s) for which you know the answer.

FWIW MSVS uses template magic to determine if it can use memset and memcpy inside containers. Not sure if the algo's are implemented that way as well but it is not a leap to think they would have. That said, code it both ways and measure. That will give you the best answer. — NathanOliver, Dec 21 '16 at 17:58
@NathanOliver: Ok, but - memset is also a function which needs to be implemented somehow. — einpoklum, Dec 21 '16 at 18:00
(a) `std::fill_n(ptr, n, 0)` should do the same thing as `memset(ptr, 0, n * sizeof(*ptr))` and (b) only for integral types. — Barry, Dec 21 '16 at 18:01
@Barry: (a) Fixed the missing multiplication by the size, sorry. (b) I agree that it should, but - when/for which compilers/settings can I assume that it does? — einpoklum, Dec 21 '16 at 18:04
@einpoklum That's not a compiler setting thing. That equivalence is simply only true under those conditions. If `ptr` were a `string*`, then `fill_n` is not allowed for instance. — Barry, Dec 21 '16 at 18:05
Note that the two uses are semantically different, e.g., for pointers: a null pointer isn't necessarily represented using all zero bits! — Dietmar Kühl, Dec 21 '16 at 18:05
A nullpointer isn't necessarily all bits 0. I'm not sure about the standard's requirements on floating point types, but I think it's the same story there. I.e., `std::fill_n`, which fills with logical nullvalues, is not the same as `memset` with 0, in general. — Cheers and hth. - Alf, Dec 21 '16 at 18:13
In my experience, most optimizers *are* smart enough to recognize that a for-loop or standard algorithm is equivalent to one of the compiler intrinsics/__builtins, and generate the same code. I wouldn't bother unless the code is caught by the profiler. — Bo Persson, Dec 21 '16 at 18:40

score 6 · Answer 1 · answered Dec 21 '16 at 18:38

The answer depends on your implementation of the standard library.

MSVC for example has several implementations of std::fill_n based on the types of what you're trying to fill.

Calling std::fill_n with a char* or signed char* or unsigned char* and it will directly call memset to fill the array.

inline char *_Fill_n(char *_Dest, size_t _Count, char _Val)
{   // copy char _Val _Count times through [_Dest, ...)
_CSTD memset(_Dest, _Val, _Count);
return (_Dest + _Count);
}

If you call with another type, it will fill in a loop:

template<class _OutIt,
class _Diff,
class _Ty> inline
_OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val)
{   // copy _Val _Count times through [_Dest, ...)
for (; 0 < _Count; --_Count, (void)++_Dest)
    *_Dest = _Val;
return (_Dest);
}

The best way to determine the overhead on your particular compiler and standard library implementation would be to profile the code with both calls.

score 2 · Answer 2 · answered Dec 22 '16 at 07:44

For all all scenarios where memset is appropriate (i.e. all your objects are PODs) you will most likely find that the two statements are equivalent when any level of optimisation is enabled.

For scenarios where memset is not appropriate, comparison is moot because the use of memset would result in an incorrect program.

You can easily check for yourself using tools such as godbolt (and many others):

for example, on gcc6.2 these two functions generate literally identical code with optimisation level -O3:

#include <algorithm>
#include <cstring>

__attribute__((noinline))
  void test1(int (&x) [100])
{
  std::fill_n(&x[0], 100, 0);
}

__attribute__((noinline))
  void test2(int (&x) [100])
{
  std::memset(&x[0], 0, 100 * sizeof(int));
}

int main()
{
  int x[100];
  test1(x);
  test2(x);
}

https://godbolt.org/g/JIwI5l

Nice quick technique using godbolt to compare the assembly code produced for different versions of the function. Thanks! — rpattabi, Aug 21 '20 at 13:38

What performance can I expect from std::fill_n(ptr, n, 0) relative to memset?

2 Answers2