0

I am writing a program that has to work through roughly 1000 candidates and find the best score. I need to use multiprocessing to work through a list because this will be done roughly 60000 times. How would we use multiprocessing in this situation. Say that the score is calculated like this:

def get_score(a, b):
    return (a * b) / (a + b)

I know a in every case but it changes every time you go through the list of candidates because it adds the best candidate to the list. I want it to iterate through a list of candidates and then find the best score. A non-multiprocessing example would be like this:

s = [random.randint(0, 100)]
candidates = [random.randint(0, 100) for i in range(1000)]  

for i in range(60000):
    best_score = 0
    best_candidate = candidates[0]
    for j in candidates:
        if get_score(s[-1], j) > best_score:  
            best_candidate = j
            best_score = get_score(s[-1], j)
    s.append(best_candidate)

I know that I could create a function but I feel like there is an easier way to do this. Sorry for the beginner question.:/

Fateh
  • 302
  • 2
  • 12

2 Answers2

0

One easy way to speed things up would be to use vectorization (as a first optimization step, rather than multiprocessing). You can achieve this by using numpy ndarrays.

misha
  • 777
  • 1
  • 9
  • 21
0

Your code has many inconsistencies like not updating best_score and still comparing with a 0- valued best score.

Your nested loop based design makes it hard to parralelize the solution, you also didn't provide more details like do order matters?

I'm giving a dummy multiprocessing based solution, which runs the 60000 range loop into n-cpus in parallel, and writes those solutions to numpy arrays. However, it's upto you how you'll merge the solution.

import random
import numpy as np
import multiprocessing as mp

s = [random.randint(0, 100)]
candidates = [random.randint(0, 100) for i in range(1000)]  

n_cpu = mp.cpu_count()


def get_score(a, b):
    return (a * b) / (a + b)



def partial_gen(num_segment):
  part_arr = []
  for i in range(60000//n_cpu): # breaking the loop into n_cpu segments
      best_score = 0
      best_candidate = candidates[0]
      for j in candidates:
          new_score = get_score(s[-1], j)
          if new_score > best_score:  
              best_candidate = j
              best_score = new_score # are you sure? you don't wanna update this?
      part_arr.append(best_candidate)
  part_arr = np.array(part_arr)
  np.save(f'{num_segment}.npy', part_arr)

p = mp.Pool(n_cpu)
p.map(partial_gen, range(n_cpu))
Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
  • Could I have the function ```partial_gen()``` return a value or should I append a value to a list? – Fateh May 04 '20 at 13:08
  • You can't do that directly, you need to use shared variables, this might help: https://stackoverflow.com/questions/10415028/how-can-i-recover-the-return-value-of-a-function-passed-to-multiprocessing-proce – Zabir Al Nazi May 04 '20 at 13:15