0

I'm working in a project in which some process take a long time to finish (in total > 2hours) and in which some process are clearly possible to be parallelized. Some examples are these:

for n in range(images):
    entry = ImD.ImageData(width, height)
    entry.interpolate_points(seeds)
    entries.append(entry)

def interpolate_points(self, seeds):
    points = []
    f = []
    for i in range(seeds):
        # Generate a cell position
        pos_x = random.randrange(self.width)
        pos_y = random.randrange(self.height)

        # Save the f(x,y) data
        x = Utils.translate_range(pos_x, 0, self.width, self.range_min, self.range_max)
        y = Utils.translate_range(pos_y, 0, self.height, self.range_min, self.range_max)
        z = Utils.function(x, y)
        points.append([x, y])

        f.append(z)
    for x in range(self.width):
        xt = (Utils.translate_range(x, 0, self.width, self.range_min, self.range_max))
        for y in range(self.height):
            yt = (Utils.translate_range(y, 0, self.height, self.range_min, self.range_max))
            self.data[x][y] = Utils.shepard_euclidian(points, f, [xt, yt], 3)

Interpolate points method take a relevant time to be concluded and as I call it more then 40 times, I believe I could have some of this call running in parallel.

def generate_pixel_histogram(self, images, bins):
    """
    Generate a histogram of the image for each pixel, counting
    the values assumed for each pixel in a specified bins
    """
    max_value = 0.0
    min_value = 0.0
    for i in range(len(images)):
        image = images[i]
        max_entry = max(max(p[1:]) for p in image.data)
        min_entry = min(min(p[1:]) for p in image.data)
        if max_entry > max_value:
            max_value = max_entry
        if min_entry < min_value:
            min_value = min_entry

    interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins

    for x in range(self.width):
        for y in range(self.height):
            pixel_histogram = {}
            for i in range(bins+1):
                key = round(min_value+(i*interval_size), 2)
                print key
                pixel_histogram[key] = 0.0
            for i in range(len(images)):
                image = images[i]
                value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
                pixel_histogram[value] += 1.0/len(images)
            self.data[x][y] = pixel_histogram

The method for generating a pixel histogram is another case. Here, I have multiples images and for each position of my image I have to generate a histogram. So, each position is clearly independent of the others, so I believe this is a clearly case that could be parallelized.

The problem is 'cause I have look at multiprocessing in Python, Cython, and so on, but I didn't figure out how to apply this is my code. I have never worked with multiprocessing in practice, so I am having some difficult to apply this concept in my problem.

I've tried this:

p = Pool(5)
for n in range(images):
    entry = ImD.ImageData(width, height)
    entries.append(entry)

p.map(ImD.interpolate_points, entries)

But it doesn't work, since I'm working with class.

Any help would be appreciated. Thanks in advance.

pceccon
  • 9,379
  • 26
  • 82
  • 158

1 Answers1

0

You could try the parallel map from multiprocess. This is a kind of "queue" model in which you put a lot of tasks to do, you bring up some working processes, and they work on them.

http://docs.python.org/2/library/multiprocessing.html

An example (taken from that page):

from multiprocessing import Pool
p = Pool(5)
def f(x):
    return x*x

p.map(f, range(50))

This will bring up 5 working processes, and they will take their works from the list you passed to map.

Notice that there is no processing order guaranteed.

finiteautomata
  • 3,753
  • 4
  • 31
  • 41
  • Thanks @geekazoid. This seems to apply to my first problem (I will try this), but I guess that I should use another approach for the second one. Do you how could I parallelize the second example? Thank you very much. – pceccon Jan 06 '14 at 14:51
  • I don't clearly understand the problem, but suppose there is a problem with that double for. You could convert that double for into a single list using cartesian product. itertools.product gives you the cartesian product of two iterators (or lists) and returns you an iterator, not another long list ;) http://docs.python.org/2/library/itertools.html – finiteautomata Jan 06 '14 at 15:27
  • I don't know how to apply your example to my problem. I have three lines of code that have to be executed for one instance. I have to create a method for it before using your example, @geekazoid? – pceccon Jan 07 '14 at 10:42
  • @pceccon sorry for the delay. Yes, you have to create a separate function. Tell me if you could do this. – finiteautomata Jan 14 '14 at 19:57
  • Yeah, I did it. But somehow is much more slow then without multiprocessing! OO http://stackoverflow.com/questions/21136404/multiprocessing-taking-more-time-then-no-multiprocessing-python – pceccon Jan 15 '14 at 12:21