I have a code which scrape friend list from Facebook UID. It worked but it takes a long time to scrape a whole list. So, I want to speed it up by using multiprocessing and Selenium Grid. The following is the approach I use:
- Login Facebook with account
- Open 5 instances Firefox with same cache and cookie ( so I don't need to login again)
- Scrape friend list from 5 different UID simultaneously. 1 instance/1 UID
This is my code but it doesn't work
import multiprocessing
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
from selenium import webdriver
def friend_uid_list(uid, driver):
driver.get('https://www.facebook.com/' + uid + '/friends')
//scrape friend list
target.close()
def g(arg):
return friend_uid_list(*arg)
if __name__ == '__main__':
driver = webdriver.Firefox()
driver.get("https://www.facebook.com/")
driver.find_element_by_css_selector("#email").send_keys("email@gmail.com")
driver.find_element_by_css_selector("#pass").send_keys("password")
driver.find_element_by_css_selector("#u_0_m").click()
pool = multiprocessing.Pool(5)
pool.map(g, [(100004159542140,driver),(100004159542140,driver),(100004159542140,driver)])
So, can you show me how to use Selenium Grid to use multiple instances simultaneously ? I searched a lot but don't know how to implement it to my code. Thank you :)