17

I am trying to run a simple command that guesses gender by name using multiprocessing. This code worked on a previous machine so perhaps my setup had something to do with it.

Below is my multiprocessing code:

import sys
import gender_guesser.detector as gender
import multiprocessing
import time

d = gender.Detector()

def guess_gender (name):
    n = name.title() # make first letter upper case and the rest lower case 
    g = d.get_gender(n) # guess gender
    return g

ls = ['john','joe','amamda','derick','peter','ashley','john','joe','amamda','derick','peter','ashley']

t=time.time()

results=[]
def callBack(x):
    results.append(x)

pool = multiprocessing.Pool(processes=multiprocessing.cpu_count()-1, maxtasksperchild=1)

for n in ls:
    print (n)
    pool.apply_async(guess_gender,args=[n],callback=callBack)

pool.close()
pool.join()

results = pd.concat(results)

print(time.time()-t)

It simply runs and doesn't do anything. In my cmd window, I see the following at the end of an error message:

AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>

Am running python version 3.6.1 on Anaconda:

import sys
print(sys.version)
3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]

Update: Still cannot get it to work. Below is the entire cmd log when I ran the code provided. I appreciate any thoughts you may have!

C:\Users\ywu\Google Drive>jupyter notebook
[I 10:13:43.954 NotebookApp] Serving notebooks from local directory: C:\Users\ywu\Google Drive
[I 10:13:43.954 NotebookApp] 0 active kernels
[I 10:13:43.955 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=255a5c0c9af337a1c2187feb63f1c426fb903e5929a0b2f0
[I 10:13:43.956 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 10:13:43.959 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=255a5c0c9af337a1c2187feb63f1c426fb903e5929a0b2f0
[I 10:13:44.264 NotebookApp] Accepting one-time-token-authenticated connection from ::1
[W 10:13:44.319 NotebookApp] 404 GET /api/kernels/aceb78ee-73e4-4481-9993-63e5ee8f72cb/channels?session_id=AEA3C6B2B0A440FC84FF3BAF5F5CB615 (127.0.0.1): Kernel does not exist: aceb78ee-73e4-4481-9993-63e5ee8f72cb
[W 10:13:44.328 NotebookApp] 404 GET /api/kernels/aceb78ee-73e4-4481-9993-63e5ee8f72cb/channels?session_id=AEA3C6B2B0A440FC84FF3BAF5F5CB615 (127.0.0.1) 20.07ms referer=None
[I 10:13:54.740 NotebookApp] Creating new notebook in /code/python
[I 10:13:55.241 NotebookApp] Kernel started: 45ab2da6-7466-408c-aa5a-98f7db54e711
[W 10:14:00.341 NotebookApp] Replacing stale connection: aceb78ee-73e4-4481-9993-63e5ee8f72cb:AEA3C6B2B0A440FC84FF3BAF5F5CB615
Process SpawnPoolWorker-2:
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Process SpawnPoolWorker-1:
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
Process SpawnPoolWorker-4:
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
Process SpawnPoolWorker-3:
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Process SpawnPoolWorker-5:
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
Process SpawnPoolWorker-6:
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
Traceback (most recent call last):
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
Process SpawnPoolWorker-7:
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-9:
Process SpawnPoolWorker-10:
Traceback (most recent call last):
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
Process SpawnPoolWorker-11:
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
Process SpawnPoolWorker-14:
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
Traceback (most recent call last):
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
Traceback (most recent call last):
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Users\ywu\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 345, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'guess_gender' on <module '__main__' (built-in)>
[W 10:14:15.043 NotebookApp] 404 GET /api/kernels/c1224db6-69c6-470e-b74c-4c7b94fb48fe/channels?session_id=D8DC8A440B044EED8EBCA374EBEAF7C6 (127.0.0.1): Kernel does not exist: c1224db6-69c6-470e-b74c-4c7b94fb48fe
[W 10:14:15.046 NotebookApp] 404 GET /api/kernels/c1224db6-69c6-470e-b74c-4c7b94fb48fe/channels?session_id=D8DC8A440B044EED8EBCA374EBEAF7C6 (127.0.0.1) 7.48ms referer=None
Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97
Trexion Kameha
  • 3,362
  • 10
  • 34
  • 60
  • If you used jupyter, how is your code is split into cells? And What order did you run those cells? –  May 17 '18 at 04:59

4 Answers4

18

I got multiprocessing to work from within a Jupyter notebook on Windows by saving my function in a separate .py file and including that file in my notebook.

Example:

f.py:

def f(name, output):
  output.put('hello {0}'.format(name))
  return

Code in Jupyter notebook:

from multiprocessing import Process, Queue

#Having the function definition here results in
#AttributeError: Can't get attribute 'f' on <module '__main__' (built-in)>

#The solution seems to be importing the function from a separate file.

import f

#Also, the original version of f only had a print statement in it.  
#That doesn't work with Process - in the sense that it prints to the console 
#instead of the notebook.
#The trick is to let f write the string to print into an output-queue.
#When Process is done, the result is retrieved from the queue and printed.

if __name__ == '__main__':    

   # Define an output queue
   output=Queue()

   # Setup a list of processes that we want to run
   p = Process(target=f.f, args=('Bob',output))

   # Run process
   p.start()

   # Exit the completed process
   p.join()

   # Get process results from the output queue
   result = output.get(p)

   print(result)

I'm a Python newby and I may have missed all sorts of details, but this works for me.

blecourt
  • 181
  • 1
  • 2
7

After much research it appears that multiprocessing is not an option to use in a notebook on windows. I am closing but please open if you have a solution. I will switch over to pathos.

Trexion Kameha
  • 3,362
  • 10
  • 34
  • 60
  • Indeed it is officially not supported by multiprocessing module. Thanks for posting that instead it is supported by the pathos module. https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers – Yuval Atzmon Oct 19 '18 at 11:13
3

This problem would be headache for people using Jupyter on windows. The code would run fine on Linux system.

In order to run the code on windows,

  1. Put the function definition in a separate ipynb file.
  2. import the file using from ipynb.fs.full.functions import func ( make sure you pip install ipynb first)
  3. This would definitely solve this.
1

How about this:

Code:

#!/usr/bin/env python3

import sys
import time
import gender_guesser.detector as gender
import pandas as pd
import multiprocessing as mp

d = gender.Detector()

def guess_gender(name):
    n = name.title()
    g = d.get_gender(n)
    return g

def run():
    ls = ['john','joe','amamda','derick','peter','ashley','john',\
          'joe','amamda','derick','peter','ashley']

    num_cpus = mp.cpu_count() - 1
    pool = mp.Pool(processes=num_cpus)
    result = pool.map(guess_gender, ls)

    df = pd.DataFrame(result, columns=["gender"])

    print("\ntook {} secs to classify\n".format(str(time.time() - st)))
    print(df)  # or you could save the dataframe using .to_csv()

st = time.time()

if __name__ == "__main__":
    run()

Output:

took 0.0150408744812 secs to classify

           gender
0            male
1            male
2         unknown
3            male
4            male
5   mostly_female
6            male
7            male
8         unknown
9            male
10           male
11  mostly_female
o-90
  • 17,045
  • 10
  • 39
  • 63
  • I still get AttributeError: Can't get attribute 'guess_gender' on – Trexion Kameha Aug 16 '17 at 18:10
  • I'm copying and pasting the exact code I wrote and running it in a python 3.6.1 env created with Anaconda. It runs and produces the output I've shown. I can't reproduce your problem, did you copy the exact code? – o-90 Aug 16 '17 at 18:24
  • yes I did. still get the error in my cmd window: "AttributeError: Can't get attribute 'guess_gender' on " – Trexion Kameha Aug 17 '17 at 05:14
  • 3
    I should add I'm running in a jupyter notebook. My apologies as I am a n00b. – Trexion Kameha Aug 20 '17 at 20:28
  • For people stumbling on the code by gobrewers14: the problem in the original post is Windows-related. Judging from the code above gobrewers14 is linux/unix-based and the code in that environment works. In Windows the same code doesn't work. – manu3d Jun 04 '19 at 20:25
  • The code does not work in windows. The bug is here https://bugs.python.org/issue25053 – Akhil Jain Jan 28 '21 at 03:38