Yes use of calloc()
-allocated memory will suffer a performance degradation due to the Meltdown and Spectre patches.
In fact, calloc()
isn't special here: malloc()
, new
and more generally all allocated memory will probably suffer approximately the same performance impact. Both calloc()
and malloc()
are ultimately backed by pages returned by the OS (although the allocator will re-use them after they are freed). The only real difference being that a smart allocator, when it goes down the path of using new pages from the OS (rather than re-using a previously free
d allocation) in the case of calloc
it can omit the zeroing because the OS-provided pages are guaranteed to be zero. Other than that the allocator behavior is largely the same and the OS-level zeroing behavior is the same (there is usually no option to ask the OS for non-zero pages).
So the performance impact applies more broadly than you thought, but the performance impact is likely smaller than you suggest, since a page fault is already doing a lot of work anyways, so you aren't talking an order of magnitude degradation or anything. See Peter's answer on the reasons the performance impact is likely to be limited. I wrote this answer mostly because the answer to your headline question is still yes as there is some impact.
To estimate the impact on a malloc
heavy workflow, I tried running some allocation and page-fault heavy test on a current kernel (4.13.0-39-generic
) with the Spectre and Meltdown mitigations, as well as on an older kernel prior to these mitigations.
The test code is very simple:
#include <stdlib.h>
#include <stdio.h>
#define SIZE (40 * 1024 * 1024)
#define PG_SIZE 4096
int main() {
char *mem = malloc(SIZE);
for (volatile char *p = mem; p < mem + SIZE; p += PG_SIZE) {
*p = 'z';
}
printf("pages touched: %d\npoitner value : %p\n", SIZE / PG_SIZE, mem);
}
The results on the newer kernel were about ~3700 cycles per page fault, and on the older kernel without mitigations around ~3300 cycles. The overall regression (presumably) due to the mitigations was about 14%. Note that this in on Skylake hardware (i7-6700HQ) where some of the Spectre mitigations are somewhat cheaper, and the kernel supports PCID which makes the KPTI Meltdown mitigations cheaper. The results might be worse on different hardware.
Oddly, the results on the new kernel with Spectre and Meltdown mitigations disabled at boot (using spectre_v2=off nopti
) were much worse than either the new kernel default or the old kernel, coming in at about 5050 cycles per page fault, something like a 35% regression over the same kernel with the mitigations enabled. So something is going really wrong, performance-wise when the mitigations are disabled.
Full Results
Here is the full perf stat
output for the two runs.
Old Kernel (4.10.0-42)
pages touched: 10240
poitner value : 0x7f7d2561e010
Performance counter stats for './pagefaults':
12.980048 task-clock (msec) # 0.976 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
10,286 page-faults # 0.792 M/sec
33,662,397 cycles # 2.593 GHz
27,230,864 instructions # 0.81 insn per cycle
4,535,443 branches # 349.417 M/sec
11,760 branch-misses # 0.26% of all branches
0.013293417 seconds time elapsed
New Kernel (4.13.0-39)
pages touched: 10240
poitner value : 0x7f306ad69010
Performance counter stats for './pagefaults':
14.789615 task-clock (msec) # 0.966 CPUs utilized
8 context-switches # 0.541 K/sec
0 cpu-migrations # 0.000 K/sec
10,288 page-faults # 0.696 M/sec
38,318,595 cycles # 2.591 GHz
28,796,523 instructions # 0.75 insn per cycle
4,693,944 branches # 317.381 M/sec
26,853 branch-misses # 0.57% of all branches
0.015312764 seconds time elapsed
New Kernel (4.13.0.-39) spectre_v2=off nopti
pages touched: 10240
poitner value : 0x7ff079ede010
Performance counter stats for './pagefaults':
16.690621 task-clock (msec) # 0.982 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
10,286 page-faults # 0.616 M/sec
51,964,080 cycles # 3.113 GHz
28,602,441 instructions # 0.55 insn per cycle
4,699,608 branches # 281.572 M/sec
25,064 branch-misses # 0.53% of all branches
0.017001581 seconds time elapsed