2

I wrote code like this:

def process(data):
   #create file using data

all = ["data1", "data2", "data3"]

I want to execute process function on my all list in parallel, because they are creating small files so I am not concerned about disk write but the processing takes long, so I want to use all of my cores.

How can I do this using default modules in python 2.7?

Yash
  • 3,438
  • 2
  • 17
  • 33
Joan Venge
  • 315,713
  • 212
  • 479
  • 689
  • 2
    Help you to search next time, keywords: `multiprocessing python map`. Then you can easily find solution from Google. – Sraw Aug 13 '18 at 03:44

3 Answers3

9

Assuming CPython and the GIL here.

If your task is I/O bound, in general, threading may be more efficient since the threads are simply dumping work on the operating system and idling until the I/O operation finishes. Spawning processes is a heavy way to babysit I/O.

However, most file systems aren't concurrent, so using multithreading or multiprocessing may not be any faster than synchronous writes.

Nonetheless, here's a contrived example of multiprocessing.Pool.map which may help with your CPU-bound work:

from multiprocessing import cpu_count, Pool

def process(data):
    # best to do heavy CPU-bound work here...

    # file write for demonstration
    with open("%s.txt" % data, "w") as f:
        f.write(data)

    # example of returning a result to the map
    return data.upper()
      
tasks = ["data1", "data2", "data3"]
pool = Pool(cpu_count() - 1)
print(pool.map(process, tasks))

A similar setup for threading can be found in concurrent.futures.ThreadPoolExecutor.

As an aside, all is a builtin function and isn't a great variable name choice.

max
  • 29,122
  • 12
  • 52
  • 79
ggorlen
  • 44,755
  • 7
  • 76
  • 106
7

Or:

from threading import Thread

def process(data):
    print("processing {}".format(data))

l= ["data1", "data2", "data3"]

for task in l:
    t = Thread(target=process, args=(task,))
    t.start()

Or (only python version > 3.6.0):

from threading import Thread

def process(data):
    print(f"processing {data}")

l= ["data1", "data2", "data3"]

for task in l:
    t = Thread(target=process, args=(task,))
    t.start()
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
3

There is a template of using multiprocessing, hope helpful.

from multiprocessing.dummy import Pool as ThreadPool

def process(data):
    print("processing {}".format(data))
alldata = ["data1", "data2", "data3"]

pool = ThreadPool()

results = pool.map(process, alldata)

pool.close()
pool.join()
pwxcoo
  • 2,903
  • 2
  • 15
  • 21
  • 2
    OP says "I want to use all of my cores". `ThreadPool` won't use more than one core. – Sraw Aug 13 '18 at 03:43