Python Multiprocessing/Pool with Selenium - Creating multiple copies of webdriver

Question

I have to scrape some websites and want to speed up the process. I'm trying to use Multiprocessing to split up the work so I can scrape two websites concurrently.

from multiprocessing import Process, current_process, Pool
from selenium import webdriver
import os
from functools import partial

def pool_function(url, browser):
    name = current_process().name
    print('Process name: %s') % (name)
    print(url)
    print('')
    browser.get(url)



if __name__ == '__main__':

    list_of_urls = ['http://www.linkedin.com', 'http://www.amazon.com', 'http://www.uber.com', 'http://www.facebook.com']
    p = Pool(processes=2)
    browser = webdriver.Chrome()
    test = partial(pool_function, browser)
    results = p.map(test, list_of_urls)

    p.close()
    p.join()

    print('all done')

This is giving me an error that the browser object can't be pickled.

Traceback (most recent call last):
  File "/Users/morganallen/Desktop/multi_processing.py", line 43, in <module>
    results = p.map(test, list_of_urls)
  File "/anaconda2/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

What would be away to create an instance of browser and then have versions pop up in individual processes that I can feed URL's to?

Update:

Trying it with pool.apply_async but still can't pickle the browser object.

def browser():
    browser = webdriver.Chrome()
    return browser

# print('all done')
if __name__ == '__main__':

    list_of_urls = ['http://www.linkedin.com', 'http://www.amazon.com', 'http://www.uber.com', 'http://www.facebook.com']

    p = Pool(processes=2)
    y = browser()
    results = [p.apply_async(cube, args=(url, y)) for url in list_of_urls]
    print(results)
    output = [p.get() for p in results]

See this: https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function — Kartikey Singh, Feb 28 '19 at 03:21
I updated my question. @KartikeySingh can you help me format the code so it works? — Morgan Allen, Feb 28 '19 at 12:54

Python Multiprocessing/Pool with Selenium - Creating multiple copies of webdriver

0 Answers0