We have built an in-memory database, which eats about 100-150G RAM in a single Vec, which is populated like this:
let mut result = Vec::with_capacity(a_very_large_number);
while let Ok(n) = reader.read(&mut buffer) {
result.push(...);
}
perf top
shows that the time is mostly spent in this "change_protection" function:
Samples: 48K of event 'cpu-clock', Event count (approx.): 694742858
62.45% [kernel] [k] change_protection
18.18% iron [.] database::Database::init::h63748
7.45% [kernel] [k] vm_normal_page
4.88% libc-2.17.so [.] __memcpy_ssse3_back
0.92% [kernel] [k] copy_user_enhanced_fast_string
0.52% iron [.] memcpy@plt
The CPU usage of this function grows as more and more data is loaded into RAM:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12383 iron 20 0 137g 91g 1372 D 76.1 37.9 27:37.00 iron
The code is running on an r3.8xlarge AWS EC2 instance, and transparent hugepage is already disabled.
[~]$ cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]
[~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping : 4
microcode : 0x428
cpu MHz : 2500.070
cache size : 25600 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips : 5000.14
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
kernel
3.14.35-28.38.amzn1.x86_64
The real question is why is there so much overhead in that function?