I am doing some Monte Carlo simulations with the following code.
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <sys/time.h>
#include <sys/types.h>
#include <time.h>
#define L 20
#define N 200000
int main() {
struct timespec ts_start, ts_end;
clock_t clk_start, clk_end;
struct rusage usage;
struct timeval tv_ustart, tv_uend;
struct timeval tv_sstart, tv_send;
clock_gettime(CLOCK_MONOTONIC, &ts_start);
clk_start = clock();
getrusage(RUSAGE_SELF, &usage);
tv_ustart = usage.ru_utime;
tv_sstart = usage.ru_stime;
/*
* the runtime of one simulation may vary a lot, so let's repeat it many times.
*/
for (int iteration = 0; iteration < N; iteration += 1) {
int node_num_by_depth[L + 1] = {1};
for (int level = 0; level < L; level += 1) {
for (int depth = level; depth != -1; depth -= 1) {
int parent_num = node_num_by_depth[depth];
node_num_by_depth[depth] = 0;
for (int parent = 0; parent < parent_num; parent += 1) {
int child_num = 1 + arc4random() % 2;
int child_depth = depth + (child_num > 1);
node_num_by_depth[child_depth] += child_num;
}
}
}
}
clock_gettime(CLOCK_MONOTONIC, &ts_end);
clk_end = clock();
getrusage(RUSAGE_SELF, &usage);
tv_uend = usage.ru_utime;
tv_send = usage.ru_utime;
double elapsed = (double) (ts_end.tv_sec - ts_start.tv_sec)
+ (double) (ts_end.tv_nsec - ts_start.tv_nsec) / 1E9;
printf("Wall clock time elapsed: %g\n", elapsed);
double cpu_time_used = ((double) (clk_end - clk_start)) / CLOCKS_PER_SEC;
printf("CPU time elapsed: %g\n", cpu_time_used);
printf("User CPU time elapsed: %lu.%06u\n",
tv_uend.tv_sec - tv_ustart.tv_sec,
tv_uend.tv_usec - tv_ustart.tv_usec);
printf("System CPU time elapsed: %lu.%06u\n",
tv_send.tv_sec - tv_sstart.tv_sec,
tv_send.tv_usec - tv_sstart.tv_usec);
}
When running on my MacBook Pro with 2.3GHz dual-core Intel Core i5, Turbo Boost up to 3.6GHz, with 64MB of eDRAM
, the output is
Wall clock time elapsed: 32.408
CPU time elapsed: 32.3566
User CPU time elapsed: 32.319151
System CPU time elapsed: 32.319114
When running on my FreeBSD server with CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz (2499.96-MHz K8-class CPU)
, it gives me
Wall clock time elapsed: 396.563
CPU time elapsed: 396.414
User CPU time elapsed: 158.377633
System CPU time elapsed: 158.376629
In both cases, the compilation instruction I used is clang main.c -Ofast -march=native -o main
.
Initially, the server was running Ubuntu 18.04, but I switched to FreeBSD after realizing that my program runs ridiculously slow. However, even on FreeBSD (where arc4random
is provided natively), it is still significantly less efficient. Now, unless there is a very good reason, I don't want to switch to a new OS again.
I have profiled the program with gprof
, and you can find the result here. We can see arc4random
is taking quite a lot of time (30.8%), but this can't be the sole reason for a tenfold performance penalty.
I have added some other measures of time as pointed out by the comments. You can see there is a huge gap between the wall clock time and the CPU time. Can I narrow it? Anyway, what I care is how long it takes to execute my program, instead of how many CPU cycles it takes.
Here is some extended information about my MacBook Pro
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro14,1
Processor Name: Intel Core i5
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Memory: 8 GB
Boot ROM Version: 184.0.0.0.0
SMC Version (system): 2.43f6
Serial Number (system): XXXXXXXXXXXX
Hardware UUID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
My server is equipped with twelve Intel Skylake Xeon Platinum 8163 2.5GHz CPUs, whose specification is shown below
I have a big find! After replacing arc4random
with random
, there is a huge performance gain on the server.
MacBook Pro with
random
Wall clock time elapsed: 5.61316 CPU time elapsed: 5.40623 User CPU time elapsed: 5.361931 System CPU time elapsed: 5.359096
Server with
random
Wall clock time elapsed: 6.69183 CPU time elapsed: 6.6875 User CPU time elapsed: 6.690042 System CPU time elapsed: 6.689085