As mentioned by @MartinGallagher, your loop is recognized and optimized by the compiler, while the copy()
version does "too much stuff" and is not optimized.
If you change your examples to fill with a non-nil
pointer value, you'll see the loop version falls behind. Also don't allocate (make()
) inside the benchmark loop, do that outside, and use b.ResetTimer()
to exclude that time.
You also have a very small slice, if you increase its size, the difference will be more noticable:
var x = new(int)
func BenchmarkSetNilOneByOne(b *testing.B) {
nums := make([]*int, 12800)
b.ResetTimer()
for i := 0; i < b.N; i++ {
for i := range nums {
nums[i] = x
}
}
}
func BenchmarkSetNilInBulk(b *testing.B) {
nils := make([]*int, 128)
for i := range nils {
nils[i] = x
}
orig := make([]*int, 12800)
var nums []*int
b.ResetTimer()
for i := 0; i < b.N; i++ {
nums = orig
for len(nums) > 0 {
nums = nums[copy(nums, nils):]
}
}
}
Benchmark results:
BenchmarkSetNilOneByOne-4 96571 10626 ns/op
BenchmarkSetNilInBulk-4 266690 4023 ns/op
Also note that your "bulk" version also assigns slice headers to nums
several times. There is a faster way to fill the slice: you do not need an additional "nils" slice, just start filling your slice, and you may copy the already filled part to the unfilled part. This also doesn't require to change / reassign to the nums
slice header. See Is there analog of memset in go?