0

I have my function:

downloadXMLPool = Pool(processes=3)
downloadFile = Pool(processes=3)


def dl_data_and_convert(url,formatIn,retry=True,noHTML=False,downloadSpecialPool=None):
    if downloadSpecialPool == None:
        dlPool = downloadXMLPool
    else:
        dlPool = downloadSpecialPool

    if formatIn == 'xml':
        return dl_xml_to_dict(url,dlPool,retry=retry,noHTML=noHTML)
    elif formatIn == 'json':
        return dl_json_to_dict(url,dlPool,retry=retry,noHTML=noHTML)


def dl_xml_to_dict(url,dlPool,retry=True,tryNumber=1,noHTML=False):
    try:
        response = None
        doc = xmltodict.parse(dlPool.apply_async(requests.get, (url,)).get().content)

        if noHTML:
            if 'html' in doc:
                if retry:
                    sleep(randrange(60,120)*tryNumber)
                    return dl_xml_to_dict(url,dlPool,tryNumber=tryNumber+1,noHTML=noHTML)
        return doc

    except Exception as e:
        if retry and request_error_tester(tryNumber,url,response,e):
            return dl_xml_to_dict(url,dlPool,tryNumber=tryNumber+1,noHTML=noHTML)
        else:
            return None

These two functions work fine.

And I have them:

def download_file(url, NameFile, folder='./', retry=True, downloadSpecialPool=None):
    if downloadSpecialPool == None:
        return launch_dl_in_file(url, downloadFile, NameFile, folder=folder, retry=retry)
    else:
        return launch_dl_in_file(url, downloadSpecialPool, NameFile, folder=folder, retry=retry)


def launch_dl_in_file(url, dlPool, NameFile, folder='./', retry=True, tryNumber=1):
    try:
        dlPool.apply_async(dl_in_file, (url, NameFile, folder)).get()
        return True

    except Exception as e:
        if match('.*HTTP Error 504.*',str(e)) != None:
            if retry:
                if tryNumber < 21:
                    sleep(15*tryNumber)
                    return launch_dl_in_file(url, dlPool, NameFile, folder=folder, tryNumber=tryNumber+1)
        print_screen_error("Download error for file : "+url+"\n\tError : "+str(e)+"\n")
        return False


def dl_in_file(url, NameFile, folder='./'):
    with closing(urllib.request.urlopen(url)) as r:
        with open(os.path.join(folder,NameFile), 'wb') as f:
            shutil.copyfileobj(r, f)
    return True

And I obtain this error:

AttributeError: Can't get attribute 'dl_in_file' on <module 'tools' from '/home/*/src/tools.py'>

I tried some modifications: use another multiprocess pool, use the downloadXMLPool, new function, etc. I still had the same issue.

Process ForkPoolWorker-4:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 354, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'dl_in_file' on <module 'tools' from '/home/*/src/tools.py'>

What am I possibly be doing wrong here?

Harshit
  • 1,510
  • 19
  • 42
  • This solution not work. If I move dl_in_file before launch_dl_in_file and I have always the same result. If I use dl_in_file in download_file I have the same problem. And I don't understand why. I use the same logic in another software. I have the problem only with thoses functions. – studyfranco Dec 14 '20 at 09:06
  • The linked answer tells you that instantiating the Pool _before_ you define your functions won't work, you first need to define all functions you want to use in the child process. – Darkonaut Dec 14 '20 at 11:25
  • I didn't understand like that. I modify my code and it work well ! I am so surprise. This is very strange. Thank you for your help – studyfranco Dec 14 '20 at 13:04

1 Answers1

0

So with the help of Darkonaut, I have the solution. The original thread are: Python Multiprocessing Pool Map: AttributeError: Can't pickle local object

The solution are easy, you have to move:

downloadXMLPool = Pool(processes=3)
downloadFile = Pool(processes=3)

after your functions.