1

I see lots of examples and occasionally (like 2 out of 100) this works but not most of the time and I'm not seeing why? Any ideas much appreciated!

I'm not that familiar with utilizing proxies and I suspect that the data is just not passing back through on those valid proxies that don't return an error and just seem to return an empty page but not sure how to test further.

The specs are centOS 7, selenium 3.6.0, phantomjs 2.1.1

import os, requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

os.environ["PATH"] += os.pathsep + '/path/to/executable'

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = ( "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36" )

url = 'https://httpbin.org/ip'
proxy = 'xxx.xxx.xxx.xxx:xxxx'

# requests indicates that the proxy is valid 99% of the time
response = requests.get(url, proxies={"http": proxy, "https": proxy})
print response.json()

service_args = [
    '--ignore-ssl-errors=true',
    '--proxy=' + proxy,
    '--proxy-type=http',
    '--ssl-protocol=any'
]

# 98% of the time this outputs u'<html><head></head><body></body></html>'
browser = webdriver.PhantomJS(desired_capabilities=dcap, service_args=service_args)
browser.get(url)
print browser.page_source
observer7
  • 45
  • 1
  • 6
  • are you sure your proxy server is good? – Reed Jones May 08 '18 at 23:08
  • I'm getting a list of hundreds from https://free-proxy-list.net/ and then testing each one. As I commented in the post, I am looping through testing it against the requests to compare and 99% test as good... – observer7 May 08 '18 at 23:26
  • 1
    Yeah it's probably because of the proxy...when using these free proxies you should expect some quirks.... – Reed Jones May 09 '18 at 02:56
  • Well, perhaps I haven't presented the question clear enough. I am running the same proxies through the python requests library and almost all of them work. When I run the same proxies through phantomjs almost all of them don't work, although occasionally a few of them do. I'm trying to understand why a working proxy via requests doesn't work via phantomjs? Of the examples that I've seen of how to utilize proxies in phantomjs, this is how I'm attempting to do it. It seems to have worked for others and I'm wondering why it only occasionally works for me? – observer7 May 09 '18 at 10:22
  • I think perhaps selenium/phantom.js isn't responding to a redirect correctly.. but it's strange that it would work through requests and not selenium... check some of these links https://github.com/ariya/phantomjs/issues/10389 https://newspaint.wordpress.com/2013/04/25/getting-to-the-bottom-of-why-a-phantomjs-page-load-fails/ https://stackoverflow.com/questions/29358269/handling-redirection-w-phantomjs-selenium – Reed Jones May 09 '18 at 14:01
  • I was just hoping that someone had ran into this before and had found a solution or might be more familiar with the problem and have suggestions for solutions or debugging. – observer7 May 09 '18 at 19:45
  • Has anyone out there have a working phontomjs using proxies? – observer7 May 10 '18 at 17:25
  • I've run into the same issue (works in requests, 98% fail in phantomjs). The best explanation I can find, though I can't confirm through my own testing, is that phantomjs doesn't handle TLS 1.0 correctly. And rather than fix it, the devs have basically thrown in the towel because of headless chrome. Try modifying your code to use headless chrome and see if you have better luck. EDIT: This is what I'm referring to https://stackoverflow.com/questions/23581291/python-selenium-with-phantomjs-empty-page-source – Enuratique Jul 16 '18 at 17:34

0 Answers0