I have a python program that uses many processes to exercise add-in cards for a specified amount of time. I based it on an old perl program.
The program is not stressing the add-in cards as much as it should. It results in performance that is is about 60% of the old perl program, and it should be at least 2x the performance of the old perl program because I'm sending data through memory instead of to disc drives.
I noticed that the %system from iostat -c -d -x -t -m /dev/md127 2
is around 20-25% on the new python program, and 5-10% on the old perl program, so I ran perf based on this question: High Kernel CPU when running multiple python progams
These are the results from perf record -a -g python ...
and perf report -g graph
on the new python program. I have tried this on two different systems, and both show a high "ktime_get" and "read_hpet".
My program uses multiprocessing.pool to run while time.time() < numberOfMinutes:
but the perf results below is from a run where I set the pool size to 1.
Is something going on where I have a lot of competition for the high precision timer on these systems?
Samples: 388K of event 'cycles', Event count (approx.): 1345230671182
Children Self Command Shared Object Symbol ◆
+ 35.81% 0.24% swapper [kernel.kallsyms] [k] cpu_startup_entry ¦
+ 29.27% 0.00% swapper [kernel.kallsyms] [k] start_secondary ¦
- 27.64% 0.15% swapper [kernel.kallsyms] [k] ktime_get ¦
- ktime_get ¦
+ 10.93% clockevents_program_event ¦
+ 5.23% cpuidle_enter_state ¦
+ 4.02% sched_clock_tick ¦
+ 3.55% tick_nohz_idle_exit ¦
+ 3.35% __tick_nohz_idle_enter ¦
+ 0.43% tick_check_idle ¦
+ 0.05% tick_sched_timer ¦
+ 0.04% clockevents_program_min_delta ¦
+ 0.01% cpuidle_idle_call ¦
+ 0.01% tick_program_event ¦
+ 0.00% intel_pstate_timer_func ¦
+ 0.00% tick_nohz_idle_enter ¦
+ 0.00% cpu_startup_entry ¦
+ 0.00% sched_clock_idle_wakeup_event ¦
+ 0.00% poll_idle ¦
+ 0.00% irq_enter ¦
- 27.60% 27.60% swapper [kernel.kallsyms] [k] read_hpet ¦
- 27.60% 27.60% swapper [kernel.kallsyms] [k] read_hpet ¦
- read_hpet ¦
+ 27.49% ktime_get ¦
+ 0.07% ktime_get_update_offsets_now ¦
+ 0.01% update_wall_time ¦
+ 0.01% __tick_nohz_idle_enter ¦
+ 0.01% clockevents_program_event ¦
+ 0.01% cpuidle_enter_state ¦
+ 0.00% tick_nohz_idle_exit ¦
+ 0.00% sched_clock_tick ¦
+ 0.00% hrtimer_interrupt ¦
+ 13.79% 0.03% swapper [kernel.kallsyms] [k] tick_nohz_idle_exit ¦
+ 11.58% 0.03% swapper [kernel.kallsyms] [k] tick_program_event ¦
+ 11.57% 0.10% swapper [kernel.kallsyms] [k] clockevents_program_event ¦
+ 9.93% 0.01% swapper [kernel.kallsyms] [k] arch_cpu_idle ¦
+ 9.90% 0.08% swapper [kernel.kallsyms] [k] cpuidle_idle_call ¦
+ 9.81% 0.16% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns ¦
+ 9.79% 0.03% swapper [kernel.kallsyms] [k] __tick_nohz_idle_enter ¦
▒
Update Discovered that the high "read_hpet" is coming from a multiprocessing.manager list. I have a function that spawns a process and walks through a directory adding each file to a manager.list(). Without this, `perf record -a -g python ...' shows me the below report. Now I'm running some experiments to see if I still get slow performance without using a manager.list(). Why does building a manager.list() from os.walk() use the high precision timer so much?
Samples: 118K of event 'cycles', Event count (approx.): 344546484735
Children Self Command Shared Object Symbol
+ 12.80% 0.01% python [kernel.kallsyms] [k] system_call_fastpath
+ 9.45% 0.00% python libpython2.7.so.1.0 [.] 0xffff80585ab02ae0
+ 8.88% 0.06% python libc-2.17.so [.] __xstat64
+ 8.81% 0.03% python [kernel.kallsyms] [k] vfs_fstatat
+ 8.43% 0.01% python [kernel.kallsyms] [k] sys_newstat
+ 8.41% 0.02% python [kernel.kallsyms] [k] SYSC_newstat
+ 8.27% 0.01% python [kernel.kallsyms] [k] user_path_at
+ 8.25% 0.03% python [kernel.kallsyms] [k] user_path_at_empty
+ 8.08% 8.05% python libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
+ 8.03% 0.00% python multiarray.so [.] 0xffff8058641f12c0
+ 7.89% 0.02% python [kernel.kallsyms] [k] filename_lookup
+ 7.83% 0.10% python [kernel.kallsyms] [k] path_lookupat
+ 6.37% 0.00% python libpython2.7.so.1.0 [.] 0xffff80585ab03940
+ 4.36% 0.00% python [unknown] [.] 0000000000000000
+ 3.94% 0.00% python multiarray.so [.] 0xffff8058641f1120
+ 3.56% 0.03% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 3.56% 0.90% python [kernel.kallsyms] [k] link_path_walk
+ 3.39% 0.00% python [unknown] [.] 0x6c2f746573617461
+ 3.10% 0.04% python libc-2.17.so [.] __getdents64
+ 3.08% 0.00% python [unknown] [.] 0x6d2f746573617461
+ 2.98% 0.02% python [kernel.kallsyms] [k] lookup_slow