I'm working in a project in which some process take a long time to finish (in total > 2hours) and in which some process are clearly possible to be parallelized. Some examples are these:
for n in range(images):
entry = ImD.ImageData(width, height)
entry.interpolate_points(seeds)
entries.append(entry)
def interpolate_points(self, seeds):
points = []
f = []
for i in range(seeds):
# Generate a cell position
pos_x = random.randrange(self.width)
pos_y = random.randrange(self.height)
# Save the f(x,y) data
x = Utils.translate_range(pos_x, 0, self.width, self.range_min, self.range_max)
y = Utils.translate_range(pos_y, 0, self.height, self.range_min, self.range_max)
z = Utils.function(x, y)
points.append([x, y])
f.append(z)
for x in range(self.width):
xt = (Utils.translate_range(x, 0, self.width, self.range_min, self.range_max))
for y in range(self.height):
yt = (Utils.translate_range(y, 0, self.height, self.range_min, self.range_max))
self.data[x][y] = Utils.shepard_euclidian(points, f, [xt, yt], 3)
Interpolate points method take a relevant time to be concluded and as I call it more then 40 times, I believe I could have some of this call running in parallel.
def generate_pixel_histogram(self, images, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
max_value = 0.0
min_value = 0.0
for i in range(len(images)):
image = images[i]
max_entry = max(max(p[1:]) for p in image.data)
min_entry = min(min(p[1:]) for p in image.data)
if max_entry > max_value:
max_value = max_entry
if min_entry < min_value:
min_value = min_entry
interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins
for x in range(self.width):
for y in range(self.height):
pixel_histogram = {}
for i in range(bins+1):
key = round(min_value+(i*interval_size), 2)
print key
pixel_histogram[key] = 0.0
for i in range(len(images)):
image = images[i]
value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
pixel_histogram[value] += 1.0/len(images)
self.data[x][y] = pixel_histogram
The method for generating a pixel histogram is another case. Here, I have multiples images and for each position of my image I have to generate a histogram. So, each position is clearly independent of the others, so I believe this is a clearly case that could be parallelized.
The problem is 'cause I have look at multiprocessing in Python, Cython, and so on, but I didn't figure out how to apply this is my code. I have never worked with multiprocessing in practice, so I am having some difficult to apply this concept in my problem.
I've tried this:
p = Pool(5)
for n in range(images):
entry = ImD.ImageData(width, height)
entries.append(entry)
p.map(ImD.interpolate_points, entries)
But it doesn't work, since I'm working with class.
Any help would be appreciated. Thanks in advance.