Multiprocessing on parselmouth objects

Question

I am trying to apply multiprocessing on parselmouth PRAAT functions to extract multiple functions.

 import concurrent.futures
 import time

 with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
        futures = [executor.submit(self.f1, ),executor.submit(self.f2, )]
        for future in concurrent.futures.as_completed(futures):
            data = future.result()

But I get this error which is the issue of unpicklable objects while multiprocessing:

The above exception was the direct cause of the following exception:

 TypeError Traceback (most recent call last)
 <ipython-input-15-2dea24cd4bf1> in <module>
  2 strr = time.time()
  3 FeatureCalculateobj = FeatureCalculate(os.path.join(root,'P1_A2.wav'),async_flag=True)
  4 print(time.time()-strr)
  5 

  <ipython-input-13-f5b7e5b9fab3> in __init__(self, path, async_flag)
 26 
 27         if async_flag:
                self.main()
 29         else:
 30             self.str_pauses = self.extract_pauses_unasync()

  <ipython-input-13-f5b7e5b9fab3> in main(self)
176             futures = [executor.submit(self.jitter_shimmer, )]
177             for future in concurrent.futures.as_completed(futures):
                data = future.result()
179             print(data)
180 

 ~/miniconda3/envs/via_3.7/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
        raise CancelledError()
427           elif self._state == FINISHED:
                 return self.__get_result()
429 
430             self._condition.wait(timeout)

  ~/miniconda3/envs/via_3.7/lib/python3.7/multiprocessing/queues.py in _feed(buffer, notempty, send_bytes, writelock, close, ignore_epipe, onerror, queue_sem)
234 
235                         # serialize the data before acquiring the lock
                 obj = _ForkingPickler.dumps(obj)
237              if wacquire is None:
238                   send_bytes(obj)



~/miniconda3/envs/via_3.7/lib/python3.7/multiprocessing/reduction.py 
 in dumps(cls, obj, protocol)
 49     def dumps(cls, obj, protocol=None):
 50         buf = io.BytesIO()
            cls(buf, protocol).dump(obj)
 52         return buf.getbuffer()
 53 

         TypeError: can't pickle parselmouth.Sound objects
  cls(buf, protocol).dump(obj) TypeError: can't pickle parselmouth.Sound objects

Although I am not passing parselmouth.Sound objects as arguments to the class methods , I still get this error

what is **`cls`**? Please provide the context that this is called in. — Lord Elrond, Feb 25 '20 at 05:56
may be related to https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function — RomainL., Feb 25 '20 at 14:30
Can you provide the traceback too please? This looks like a bug with the packages you are using. — Lord Elrond, Feb 25 '20 at 17:48

score 0 · Answer 1 · answered Jun 26 '20 at 12:39

Hmmm, apparently Parselmouth object don't support the pickle protocol yet (see the error message TypeError: can't pickle parselmouth.Sound objects). I had never thought of it in the context of multiprocessing, but it might be worth implementing for these kinds of use cases, indeed!

I'll see if I can get it in the next release. Something tells me it shouldn't be ridiculously hard :-)

Meanwhile, what you could probably do is write the file to a temporary directory, and read it again in each of the separate processes of multiprocessing?

DRFeinberg · Answer 2 · 2021-10-08T15:06:53.340

You can do multiprocessing in Parselmouth by using Python's multiprocessing module and making sure the sounds are created in each process, also if on windows create the process in an if __name__ == __main__ statement. asyncio is not for CPU-bound processes, like Parselmouth is, so multiprocessing is more appropriate. Here is a working example:

import parselmouth
from multiprocessing import Pool, cpu_count
from glob import glob

# First make a function to multiprocess
def get_mean_pitch(filename):
    try:
        pitch = parselmouth.Sound(filename).to_pitch()
        mean_pitch = parselmouth.praat.call(pitch, "Get mean", 0, 0, "Hertz")
    except:
        mean_pitch = 0
    return mean_pitch

# Then make a list of wav files to iterate over
sound_filenames = sorted(glob("*.wav")

# Get the number of CPUs on the machine
cpus = cpu_count()

# Create a pool of CPUs
pool = Pool(cpus)

# Map the function over the list of files
results = pool.map(get_pitch, sound_filenames)

Multiprocessing on parselmouth objects

2 Answers2