1

I'm trying to use multiprocessing in python (2.7.8) on a Mac OSX. After reading Velimir Mlaker's answer to this question, I was able to use multiprocessing.Pool() to multiprocess a trivially simple function but it doesn't work with my actual function. I get the right results but it executes serially. I believe the problem is that my function loops over a music21.stream() which is similar to a list but has special functionality for music data. I believe that music21 streams cannot be pickled so is there some multiprocessing alternative to pool that I can use? I don't mind if results are returned out of order and I can upgrade to a different version of python if necessary. I've included my code for the multiprocessing task but not for the stream_indexer() function it calls. Thank you!

import multiprocessing as mp
def basik(test_piece, part_numbers):
    jobs = []
    for i in part_numbers:
        # Each 2-tuple in jobs has an index <i> and a music21 stream that
        # corresponds to an individual part in a musical score.
        jobs.append((i, test_piece.parts[i]))
    pool = mp.Pool(processes=4)
    results = pool.map(stream_indexer, jobs)
    pool.close()
    pool.join()

    return results
Community
  • 1
  • 1
Alex
  • 2,154
  • 3
  • 26
  • 49
  • Which operating system (mp works differently on windows)? Its not a pickle problem or you would get an exception. Is the stream in memory? mp may be serializing and passing the entire stream to the worker. – tdelaney May 08 '15 at 22:56
  • Thanks for your input, I'm working in Mac OSX. The program is definitely running serially when it's given streams, and each tuple in the list has a stream as it's second element but why would that cause it to run serially? Also how could I avoid passing the entire stream to each worker? – Alex May 10 '15 at 00:41
  • 1
    Can you defer the creation of the stream to the worker? That would aid parallelization. You could also try putting the stream in a global list before creating the pool and passing the list index to the worker. Since child gets a copy on write view of parent's memory, it would already be there. If the streams are on disk, you are ultimately dependent on storage speed. – tdelaney May 10 '15 at 17:30
  • 1
    I also tried deferring the creation of the stream to the worker, but even when I do that it still executes serially so I think it's because streams can't be pickled. – Alex May 22 '15 at 18:28
  • 1
    I think that your general question is interesting enough that I won't call this an "answer" but it is possible to pickle a music21 Stream; look at the music21.freezeThaw library. Call StreamFreezer(s, fastButUnsafe=True) and pass that stream along. Then call StreamThawer() on the pickled data. I'll see if I can make this more automated. – Michael Scott Asato Cuthbert Aug 03 '15 at 21:20
  • 1
    Btw -- the newest (git) versions of music21 have Streams that are usually pickleable. – Michael Scott Asato Cuthbert Aug 05 '15 at 00:41

1 Answers1

1

The newest git commits of music21 have features to help with some of the trickier parts of multiprocessing, based on joblib. For instance if you want to count all the notes in a part, you can normally do in serial:

import music21
def countNotes(s):
    return len(s.recurse().notes)
    # using recurse() instead of .flat to avoid certain caches...

bach = music21.corpus.parse('bach/bwv66.6')
[countNotes(p) for p in bach.parts]

in parallel it works like this:

music21.common.runParallel(list(bach.parts), countNotes)

BUT! here's the huge caveat. Let's time these:

In [5]: %timeit music21.common.runParallel(list(b.parts), countNotes)
10 loops, best of 3: 152 ms per loop

In [6]: %timeit [countNotes(p) for p in b.parts]
100 loops, best of 3: 2.19 ms per loop

On my computer (2 core, 4 thread), running in parallel is nearly 100x SLOWER than running in serial. Why? Because there's a significant overhead to preparing a Stream to be multiprocessed. If the routine being run is very slow (around 1ms/note divided by number of processors) then it's worth passing Streams around in multiprocessing. Otherwise, see if there are ways to only pass back and forth small bits of information, such as the path to process:

def parseCountNotes(fn):
    s = corpus.parse(fn)
    return len(s.recurse().notes)  

bach40 = [b.sourcePath for b in music21.corpus.search('bwv')[0:40]]

In [32]: %timeit [parseCountNotes(b) for b in bach40]
1 loops, best of 3: 2.39 s per loop

In [33]: %timeit music21.common.runParallel(bach40, parseCountNotes)
1 loops, best of 3: 1.83 s per loop

Here we're beginning to get speedups even on a MacBook Air. On my office Mac Pro, the speedups get huge for calls such as this. In this case, the call to parse massively dominates over the time to recurse().