6

I'm trying to set a proxy for webscraping using selenium + phantomjs. I'm using python.

I've seen in many places that there is a bug in phantomjs such that proxy-auth does not work.

from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.by import By
service_args = [
'--proxy=http://fr.proxymesh.com:31280',
'--proxy-auth=USER:PWD',
'--proxy-type=http',
]

driver = webdriver.PhantomJS(service_args=service_args)
driver.get("https://www.google.com")
print driver.page_source

Proxy mesh suggests using the following instead:

page.customHeaders={'Proxy-Authorization': 'Basic '+btoa('USERNAME:PASSWORD')};

but I'm not sure how to translate that into python.

This is what I currently have:

from selenium import webdriver
import base64
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.by import By

service_args = [
'--proxy=http://fr.proxymesh.com:31280',
'--proxy-type=http',
]

headers = { 'Proxy-Authorization': 'Basic ' +   base64.b64encode('USERNAME:PASSWORD')}

for key, value in enumerate(headers):
    webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = value

driver = webdriver.PhantomJS(service_args=service_args)
driver.get("https://www.google.com")
print driver.page_source

but it doesn't work.

Any suggestions for how I could get this to work?

chris
  • 1,869
  • 4
  • 29
  • 52

3 Answers3

5

I'm compiling answers from: How to correctly pass basic auth (every click) using Selenium and phantomjs webdriver as well as: base64.b64encode error

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import base64

service_args = [
    '--proxy=http://fr.proxymesh.com:31280',
    '--proxy-type=http',
]

authentication_token = "Basic " + base64.b64encode(b'username:password')

capa = DesiredCapabilities.PHANTOMJS
capa['phantomjs.page.customHeaders.Proxy-Authorization'] = authentication_token
driver = webdriver.PhantomJS(desired_capabilities=capa, service_args=service_args)

driver.get("http://...")
Community
  • 1
  • 1
  • Incredible. been stuck on this for weeks. Thanks! – chris Sep 22 '16 at 22:18
  • did you receive the bounty? I've never used it before so I don't know if I have to do anything else to give you the bounty. – chris Sep 23 '16 at 13:26
  • I think there is a bounty award button for you somewhere (but it should award it when the bounty expires anyway because you accepted the answer): http://stackoverflow.com/help/bounty –  Sep 23 '16 at 14:31
4

The solution with DesiredCapabilities didn't work for me. I have ended up with the following solution:

from selenium import webdriver  

driver = webdriver.PhantomJS(executable_path=config.PHANTOMJS_PATH, 
service_args=['--ignore-ssl-errors=true',
    '--ssl-protocol=any',
    '--proxy={}'.format(self.proxy),
    '--proxy-type=http',
    '--proxy-auth={}:{}'.format(self.proxy_username, self.proxy_password)])
buhtla
  • 2,819
  • 4
  • 25
  • 38
0

None of the above methods worked for me, I am using ProxyMeshproxies with selenium phantomJs python. and Following parameters worked for me because it resolved the error proxy authentication failed.

service_args=['--proxy=http://username:password@host:port',
              '--proxy-type=http',
              '--proxy-auth=username:password']

driver = webdriver.PhantomJS(service_args=service_args)