I'm trying to set a proxy for webscraping using selenium + phantomjs. I'm using python.
I've seen in many places that there is a bug in phantomjs such that proxy-auth does not work.
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.by import By
service_args = [
'--proxy=http://fr.proxymesh.com:31280',
'--proxy-auth=USER:PWD',
'--proxy-type=http',
]
driver = webdriver.PhantomJS(service_args=service_args)
driver.get("https://www.google.com")
print driver.page_source
Proxy mesh suggests using the following instead:
page.customHeaders={'Proxy-Authorization': 'Basic '+btoa('USERNAME:PASSWORD')};
but I'm not sure how to translate that into python.
This is what I currently have:
from selenium import webdriver
import base64
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.by import By
service_args = [
'--proxy=http://fr.proxymesh.com:31280',
'--proxy-type=http',
]
headers = { 'Proxy-Authorization': 'Basic ' + base64.b64encode('USERNAME:PASSWORD')}
for key, value in enumerate(headers):
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = value
driver = webdriver.PhantomJS(service_args=service_args)
driver.get("https://www.google.com")
print driver.page_source
but it doesn't work.
Any suggestions for how I could get this to work?