2

I am using multiprocessing pool.starmap function. I discover a strange issue.

from multiprocessing import Pool
p = multiprocessing.Pool()

NODE = [1,2,3,4];
PageRank = [0.25,0.25,0.25,0.25];
Destination = [[2,3,4],[3,4],[1,4],[2]];

Data = zip(NODE,PageRank,Destination)

So I use zip function to create a data set Data, which is a list with each entry being a tuple of length 3. Then I call the function

p.starmap(MyFunction, zip(NODE,PageRank,Destination))

It works great.

However, when I type

p.starmap(MyFunction, Data))

It output empty list []!!!! I really have no clue what is going on. I literally just replaced zip(NODE,PageRank,Destination) by Data, which should be the same thing, right?

Is that because I am using Jupyter notebook that causes this?

KevinKim
  • 1,382
  • 3
  • 18
  • 34
  • Both ways work fine for me when I actually write a Python script that executes both lines. It is true that [`multiprocessing` does not work well in interactive prompts](https://stackoverflow.com/questions/23641475/multiprocessing-working-in-python-but-not-in-ipython/23641560), so it's possible that's the cause of the issue you're seeing in Jupyter, but I'm not sure. – dano Aug 01 '19 at 14:07
  • Both versions actually work for me an a regular Python interactive interpreter on Linux, as long as I define `MyFunction` in a script and import it, rather than defining it on the interpreter. – dano Aug 01 '19 at 14:09
  • 1
    In case you are doing other things with the zip object, note that a it can only be iterated once. If by the time you use it in starmap you are already at the end of the iterator, an empty list is what you should expect. – brentertainer Aug 01 '19 at 14:10
  • Ah, I think @brentertainer is probably right. – dano Aug 01 '19 at 14:14
  • @brentertainer could you please elaborate more on it? I just start learning this. What do you mean by `"note that a it can only be iterated once"...Thanks! – KevinKim Aug 01 '19 at 14:16
  • And you can wrap it in a list to test if that is the problem, e.g. `list(zip(...))`. With a list, you can iterate as many times as you want. – brentertainer Aug 01 '19 at 14:17
  • @ftxx Do a search for the difference between an iterator and an iterable. Like [this question](https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration). Zip objects are the former, lists the latter. – brentertainer Aug 01 '19 at 14:21

1 Answers1

5

This answer is only valid if

  • you are using Python 3, and
  • you are doing things with the zip object (e.g. debug printing) that do not appear in your post

In Python 2, zip(...) returns a list; however in Python 3, it returns a zip object (which is not the same as a list like you say in your post).

A zip object is an iterator, and so can only be iterated over once. After you reach the end of the iterator, any attempt to iterate over it again will yield nothing. For example,

>>> z = zip([1, 2], [3, 4])
>>> for x in z:
...     x
... 
(1, 3)
(2, 4)
>>> for x in z:
...     x
... 
>>> list(z)
[]

To speak on my second bullet point, I suspect you are doing something seemingly innocuous like printing all the elements of Data before you pass it as an argument to pool.starmap. If that is the case, you are exhausting the iterator, then effectively telling pool.starmap to apply MyFunction to absolutely nothing.

To fix this, you have three options.

  1. Do it the first way you mentioned in which the zip object is created inside the call to pool.starmap.
  2. Do not loop over Data prior to passing it to pool.starmap.
  3. Cast the zip object as a list (Data = list(zip(NODE,PageRank,Destination))). Then it is an iterable and you can iterate over it as many times as you would like.

In my humble opinion, this issue is just a rite of passage for newcomers to Python. If it applies to you and you want to learn more, you should read up on the differences between an iterator and an iterable, perhaps starting with this SO post.

brentertainer
  • 2,118
  • 1
  • 6
  • 15