0

I am playing around with concurrent.future in python as a means to understand a few simple implementations that use multiprocessing. However, I've come across a very unexpected result. Before I begin, here are my system details:

  • Computer Type: Laptop w/ Windows 10
  • Ram: 8.00 GB
  • CPU: Intel(R) Core(TM) i7-6600U @ 2.60GHzBase Speed: 2.80 GHz
  • Sockets: 1
  • Cores: 2
  • Logical Processors: 4
  • L1 cache: 128KB
  • L2 cache: 512KB
  • L3 cache: 4.0MB
    • Take the following geometric series that computes the mean of n numbers:

                                                                              https://i.stack.imgur.com/Vr0Im.png

      With this idea in mind, I create a function that computes the mean of the integers between a low bound a (inclusive) and a high bound b (exclusive). I then run a test with and without multiprocessing on a range of 500 million integers:

      import time
      import concurrent.futures
      
      def mean(a, b):
          total_sum = 0
          for next_int in range(a, b):
              total_sum += next_int
          return total_sum / (b - a)
      
      if __name__ == '__main__':
          n = 500000000              # 500 Million
          wall_time = time.time()
          base_ans = mean(0, n)      # From 0 to n-1.
          print("Single Thread Time: " + str(time.time() - wall_time) + " sec.")
      
          work = [(0, int(n/2)), (int(n/2), n)]
          num_workers = 2            # One process per core!
          test_ans = 0
          wall_time = time.time()
      
          with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
              future_tasks = {executor.submit(mean, job[0], job[1]): job for job in work}
              for future in concurrent.futures.as_completed(future_tasks):
                  test_ans += future.result()
      
          print("Multiprocessing Time: " + str(time.time() - wall_time) + " sec.")
          print(str(base_ans) + " == " + str(test_ans / num_workers) + " => " + str(base_ans == (test_ans / num_workers)))
      

      The following produces the following output:

      Single Thread Time: 41.0769419670105 sec.     # CPU Utilization ≈ 35% (from task manager)
      Multiprocessing Time: 24.71605634689331 sec.  # CPU Utilization ≈ 70% (from task manager)
      

      As we can clearly see, a major speed up was observed (roughly 1.66x). However, if I create 4 workers, instead of 2, I get an even greater speed up:

      work = [(0, int(n/4)), (int(n/4), int(n/2)), (int(n/2), int(3*n/4)), (int(3*n/4), n)]
      num_workers = 4
      # ...
      Single Thread Time: 41.51883292198181 sec.     # CPU Utilization ≈ 35% (from task manager)
      Multiprocessing Time: 18.18532919883728 sec.  # CPU Utilization = 100% (from task manager)
      

      An even greater speed up can be seen here (roughly 2.28x) and it is even consist over many runs!

      1. Since only two processes can simultaneously run on this two (physical) core system, is the efficiency of Window's scheduler the reason for this continued speed up?
      2. How can I choose a max_worker number that provides the fastest runtime? How many more processes should I add past the physical core count?
      3. And lastly, does adding more processes past the physical core count affect the threads (in multithreading) within each process from running efficiency?
      Code Doggo
      • 2,146
      • 6
      • 33
      • 58
      • 1
        You realize logical processors do provide an actual speed up in many cases, right? Sure, the gain isn't as big, but it's non-zero (up to 30% in ideal cases). They didn't just make [hyperthreaded chips for no reason](https://en.wikipedia.org/wiki/Hyper-threading#Performance_claims). – ShadowRanger Jun 11 '18 at 14:18
      • 1
        This answer may be relevant https://stackoverflow.com/questions/1718465/optimal-number-of-threads-per-core – Ed Smith Jun 11 '18 at 14:19

      0 Answers0