1

I have a python program that uses many processes to exercise add-in cards for a specified amount of time. I based it on an old perl program.

The program is not stressing the add-in cards as much as it should. It results in performance that is is about 60% of the old perl program, and it should be at least 2x the performance of the old perl program because I'm sending data through memory instead of to disc drives.

I noticed that the %system from iostat -c -d -x -t -m /dev/md127 2 is around 20-25% on the new python program, and 5-10% on the old perl program, so I ran perf based on this question: High Kernel CPU when running multiple python progams

These are the results from perf record -a -g python ... and perf report -g graph on the new python program. I have tried this on two different systems, and both show a high "ktime_get" and "read_hpet".

My program uses multiprocessing.pool to run while time.time() < numberOfMinutes: but the perf results below is from a run where I set the pool size to 1.

Is something going on where I have a lot of competition for the high precision timer on these systems?

Samples: 388K of event 'cycles', Event count (approx.): 1345230671182                                
  Children      Self  Command          Shared Object                 Symbol                         ◆
+   35.81%     0.24%  swapper          [kernel.kallsyms]             [k] cpu_startup_entry          ¦
+   29.27%     0.00%  swapper          [kernel.kallsyms]             [k] start_secondary            ¦
-   27.64%     0.15%  swapper          [kernel.kallsyms]             [k] ktime_get                  ¦
   - ktime_get                                                                                      ¦
      + 10.93% clockevents_program_event                                                            ¦
      + 5.23% cpuidle_enter_state                                                                   ¦
      + 4.02% sched_clock_tick                                                                      ¦
      + 3.55% tick_nohz_idle_exit                                                                   ¦
      + 3.35% __tick_nohz_idle_enter                                                                ¦
      + 0.43% tick_check_idle                                                                       ¦
      + 0.05% tick_sched_timer                                                                      ¦
      + 0.04% clockevents_program_min_delta                                                         ¦
      + 0.01% cpuidle_idle_call                                                                     ¦
      + 0.01% tick_program_event                                                                    ¦
      + 0.00% intel_pstate_timer_func                                                               ¦
      + 0.00% tick_nohz_idle_enter                                                                  ¦
      + 0.00% cpu_startup_entry                                                                     ¦
      + 0.00% sched_clock_idle_wakeup_event                                                         ¦
      + 0.00% poll_idle                                                                             ¦
      + 0.00% irq_enter                                                                             ¦
-   27.60%    27.60%  swapper          [kernel.kallsyms]             [k] read_hpet                  ¦
-   27.60%    27.60%  swapper          [kernel.kallsyms]             [k] read_hpet                  ¦
   - read_hpet                                                                                      ¦
      + 27.49% ktime_get                                                                            ¦
      + 0.07% ktime_get_update_offsets_now                                                          ¦
      + 0.01% update_wall_time                                                                      ¦
      + 0.01% __tick_nohz_idle_enter                                                                ¦
      + 0.01% clockevents_program_event                                                             ¦
      + 0.01% cpuidle_enter_state                                                                   ¦
      + 0.00% tick_nohz_idle_exit                                                                   ¦
      + 0.00% sched_clock_tick                                                                      ¦
      + 0.00% hrtimer_interrupt                                                                     ¦
+   13.79%     0.03%  swapper          [kernel.kallsyms]             [k] tick_nohz_idle_exit        ¦
+   11.58%     0.03%  swapper          [kernel.kallsyms]             [k] tick_program_event         ¦
+   11.57%     0.10%  swapper          [kernel.kallsyms]             [k] clockevents_program_event  ¦
+    9.93%     0.01%  swapper          [kernel.kallsyms]             [k] arch_cpu_idle              ¦
+    9.90%     0.08%  swapper          [kernel.kallsyms]             [k] cpuidle_idle_call          ¦
+    9.81%     0.16%  swapper          [kernel.kallsyms]             [k] hrtimer_start_range_ns     ¦
+    9.79%     0.03%  swapper          [kernel.kallsyms]             [k] __tick_nohz_idle_enter     ¦

             ▒

Update Discovered that the high "read_hpet" is coming from a multiprocessing.manager list. I have a function that spawns a process and walks through a directory adding each file to a manager.list(). Without this, `perf record -a -g python ...' shows me the below report. Now I'm running some experiments to see if I still get slow performance without using a manager.list(). Why does building a manager.list() from os.walk() use the high precision timer so much?

Samples: 118K of event 'cycles', Event count (approx.): 344546484735                                             
  Children      Self  Command         Shared Object             Symbol                                           
+   12.80%     0.01%  python          [kernel.kallsyms]         [k] system_call_fastpath
+    9.45%     0.00%  python          libpython2.7.so.1.0       [.] 0xffff80585ab02ae0
+    8.88%     0.06%  python          libc-2.17.so              [.] __xstat64
+    8.81%     0.03%  python          [kernel.kallsyms]         [k] vfs_fstatat
+    8.43%     0.01%  python          [kernel.kallsyms]         [k] sys_newstat
+    8.41%     0.02%  python          [kernel.kallsyms]         [k] SYSC_newstat
+    8.27%     0.01%  python          [kernel.kallsyms]         [k] user_path_at
+    8.25%     0.03%  python          [kernel.kallsyms]         [k] user_path_at_empty
+    8.08%     8.05%  python          libpython2.7.so.1.0       [.] PyEval_EvalFrameEx
+    8.03%     0.00%  python          multiarray.so             [.] 0xffff8058641f12c0
+    7.89%     0.02%  python          [kernel.kallsyms]         [k] filename_lookup
+    7.83%     0.10%  python          [kernel.kallsyms]         [k] path_lookupat
+    6.37%     0.00%  python          libpython2.7.so.1.0       [.] 0xffff80585ab03940
+    4.36%     0.00%  python          [unknown]                 [.] 0000000000000000
+    3.94%     0.00%  python          multiarray.so             [.] 0xffff8058641f1120
+    3.56%     0.03%  swapper         [kernel.kallsyms]         [k] cpu_startup_entry
+    3.56%     0.90%  python          [kernel.kallsyms]         [k] link_path_walk
+    3.39%     0.00%  python          [unknown]                 [.] 0x6c2f746573617461
+    3.10%     0.04%  python          libc-2.17.so              [.] __getdents64
+    3.08%     0.00%  python          [unknown]                 [.] 0x6d2f746573617461
+    2.98%     0.02%  python          [kernel.kallsyms]         [k] lookup_slow
Community
  • 1
  • 1
Kestrell
  • 23
  • 6

0 Answers0