1

So, I have 3 dimentional list. For example:

A=[[[1,2,3],[4,5,6],[7,8,9]],...,[[2,4,1],[1,4,6],[1,2,4]]]

I want to process each 2 dimentional list in A independently but they all have same process. If I do it in sequence, I do:

for i in range(len(A)):
    A[i]=process(A[i])

But, it takes very long time. Could you tell me how to parallel compute by data parallelization in Python?

fahadh4ilyas
  • 450
  • 2
  • 13
  • Parallelization requires threading. That requires a lot of work. Since lists are mutable, you'll likely have to create a copy for each individual thread (if I'm smart at programming). Copying/slicing the list will likely take more time than processing it in a single thread. – Zizouz212 Oct 27 '16 at 01:43
  • List of options here... https://wiki.python.org/moin/ParallelProcessing – OneCricketeer Oct 27 '16 at 01:45
  • @Zizouz212 so, there's no way that more efficient that processing it sequentially? – fahadh4ilyas Oct 27 '16 at 01:56
  • Threading in Python is actually not parallel at all and would slow things down, because the Global Interpreter Lock only executes one Python instruction at a time regardless of the number of threads. It's only suitable for doing something while waiting for I/O on another thread. True multiprocessing would work, but sharing the data requires locking and would also quite likely slow things down if there's a lot of writing to the array. – Jim Stewart Oct 27 '16 at 01:56
  • @cricket_007 Thanks, I'll try it. – fahadh4ilyas Oct 27 '16 at 01:56
  • @Jim Stewart if I made that list into part by myself like B=A[0:10], C=A[10:20], etc, could you tell me how to process B, C, etc at the same time? – fahadh4ilyas Oct 27 '16 at 02:00
  • @user7077941 My first choice would be to try to implement `process` with numpy. – Francisco Oct 27 '16 at 02:02

1 Answers1

2

If you have multiple cores and processing each 2 dimensional list is expensive operation you could use Pool from multiprocessing. Here's a short example that squares numbers in different process:

import multiprocessing as mp

A = [[[1,2,3],[4,5,6],[7,8,9]],[[2,4,1],[1,4,6],[1,2,4]]]

def square(l):
    return [[x * x for x in sub] for sub in l]

pool = mp.Pool(processes=mp.cpu_count())
res = pool.map(square, A)

print res

Output:

[[[1, 4, 9], [16, 25, 36], [49, 64, 81]], [[4, 16, 1], [1, 16, 36], [1, 4, 16]]]

Pool.map will behave like built-in map while splitting the iterable to worker processes. It also has third parameter called chunksize that defines how big chunks are submitted to workers.

niemmi
  • 17,113
  • 7
  • 35
  • 42
  • Thank you very much! I'll try it later because my program still running sequentially since 30 minutes ago. So, I don't have to make A into some part and python will part it? – fahadh4ilyas Oct 27 '16 at 02:20
  • @user7077941 No, you don't have to split `A`. You might want to define third parameter called `chunksize` though depending on your data. – niemmi Oct 27 '16 at 02:30
  • Wow, such a simple way. Thank you. ^^ – fahadh4ilyas Oct 27 '16 at 02:32
  • Hi, I'm confused. I tried using your code but somehow error appeared "AttributeError" in with mp.Pool(processes=mp.cpu_count()) as pool. What should I do? – fahadh4ilyas Oct 27 '16 at 03:16
  • @user7077941 Didn't notice python 2.7 tag so I was running it with 3.5, I've edited the answer to work on Python 2.7. – niemmi Oct 27 '16 at 03:20
  • Can I use lambda function? Or I must use named function? – fahadh4ilyas Oct 27 '16 at 03:25
  • @user7077941 You have to use named function since [lambdas can't be pickled](http://stackoverflow.com/questions/16626429/python-cpickle-pickling-lambda-functions). – niemmi Oct 27 '16 at 03:30