I've tested times() and clock_gettime(CLOCK_MONOTONIC) on several machines, and the results is confused (run each api 10M times with single thread):
Thinkpad P50 Xeon E3-1505Mv5 [Skylake 14nm]:
times(NULL) : 450ms
clock_gettime(CLOCK_MONOTONIC): 325ms
You can see on Skylake clock_gettime is faster than times.
Here is the result on a Xeon E5-2430 [Sandy Bridge 32nm]:
times(NULL) : 600ms
clock_gettime(CLOCK_MONOTONIC): 1420ms
times(NULL) is faster now.
I also doing the same test on an old Thinkpad W510 I7-720QM [Clarksfield 45nm]:
times(NULL) : 1.73s
clock_gettime(CLOCK_MONOTONIC): 20.4s
Seems like there a some new featrues implemented by newer hardware, which boosted the clock_gettime performance?