9

This error has been under my skin for a few hours now. I decided to code up a separate project just to see if I can replicate it and I can, but ONLY on my server. This works on my Mac.

  • Mac: OSX El Capitan 10.11.6

  • Server: CentOS 7.2.1511

  • Both have PhantomJS version: 2.1.1

  • Python Mac: Python 2.7.11

  • Python Server: 2.7.5

  • Both have Selenium version: 2.53.0

Identical code ran on both:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import NoSuchElementException
import time

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36"
dcap["phantomjs.page.customHeaders.accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
dcap["phantomjs.page.customHeaders.Accept-Language"] = "en-US,en;q=0.8"
dcap["phantomjs.page.customHeaders.connection"] = "keep-alive"

driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1120, 700)
driver.get("https://www.instagram.com/espn/")

while True:
    print len(driver.find_elements_by_css_selector("a[href*='/p/']"))
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    try:
        loadMore = driver.find_element_by_link_text("Load more")
        loadMore.click()
    except NoSuchElementException:
        print "No such"
        driver.save_screenshot('none.png')

Mac output:

12
24
No such
24
No such
36
No such
48
No such
48
No such
60
No such
72
No such
84
# This goes until I end it

Server output:

12
24
No such
Traceback (most recent call last):
  File "junk.py", line 27, in <module>
    driver.save_screenshot('none.png')
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 790, in get_screenshot_as_file
    png = self.get_screenshot_as_png()
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 809, in get_screenshot_as_png
    return base64.b64decode(self.get_screenshot_as_base64().encode('ascii'))
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 819, in get_screenshot_as_base64
    return self.execute(Command.SCREENSHOT)['value']
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

Server output after removing the screenshot line:

12
24
No such
24
Traceback (most recent call last):
  File "junk.py", line 23, in <module>
    loadMore = driver.find_element_by_link_text("Load more")
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
    return self.find_element(by=By.LINK_TEXT, value=link_text)
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
    {'using': by, 'value': value})['value']
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1217, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

One related answer I found was here: Can't run PhantomJS in python via Selenium

So I installed Selenium 2.37 and it gave the same error.

I read this answer about the problem perhaps behind related to changing the headers, so I removed the headers by changing the driver to driver = webdriver.PhantomJS() and still get the same error.

I also installed 2.7.12 on the server, to see if there was a difference. Output was:

# python2.7 junk.py
12
24
No such
24
Traceback (most recent call last):
  File "junk.py", line 29, in <module>
    loadMore = driver.find_element_by_link_text("Load more")
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in find_element_by_link_text
    return self.find_element(by=By.LINK_TEXT, value=link_text)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 744, in find_element
    {'using': by, 'value': value})['value']
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/local/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/local/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/local/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/local/lib/python2.7/urllib2.py", line 1201, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/local/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
  File "/usr/local/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

Checking space on system. It's a brand new VPS, but still, to confirm:

enter image description here

Community
  • 1
  • 1
User
  • 23,729
  • 38
  • 124
  • 207
  • some useful info here http://stackoverflow.com/a/27620850/2575259 – Naveen Kumar R B Nov 25 '16 at 08:20
  • @Naveen I'm having a hard time understanding how that http lib is even called when the call is already made. For example, a screenshot? – User Nov 25 '16 at 08:56
  • please read json-wire protocol here https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol – Naveen Kumar R B Nov 25 '16 at 08:59
  • 1
    you can see the exception trace, how a WebDriver method (find_element_by_link_text), is treated as command and build HTTP request and sent to the specific WebDriver implementation (here, PhantomJs) and get the work done by getting the HTTP response back. in your case, that http request returned with a HTTP status code which the client bindings of WebDriver didn't understand – Naveen Kumar R B Nov 25 '16 at 09:06
  • any update on the progress? – Naveen Kumar R B Nov 28 '16 at 16:30
  • @Naveen Right now I'm looking for a compatible version of PhantomJS with Selenium. There doesn't seem to be any info available on that. – User Nov 28 '16 at 22:28
  • I am only get `while True: Rape_To_Server_OR_Ethernet_Interface`. [Please read this document](https://wiki.python.org/moin/WhileLoop). I am not offer any `while True:` on trial version codes. A little trick `while True( action = None ; while not action (action = some_action()); play_with_action();)` – dsgdfg Nov 30 '16 at 09:09
  • @dsgdfg This has nothing to do with the problem. `while True` and then using a break statement inside a nested conditional is fine. – User Nov 30 '16 at 21:56
  • 2
    Over long experience I have found that things work best if I sleep for 500 milliseconds immediately after a scrollTo. Hypothesis is that the scrollTo is still executing when you try to save the screen. So try putting the delay in and see if it changes the behavior. If it doesn't we start generating more hypotheses. – MikeJRamsey56 Dec 01 '16 at 00:30
  • @MikeJRamsey56 after the `scrollTo` line, I tried a 0.5 and 5 second delay, both made identical progress and output as without. – User Dec 01 '16 at 00:37
  • The failure occurred during file processing. Hypothesis 2: Permissions problem on the directory where the file is being created. Hypothesis 2a: Space problem in the file system that contains the directory where the file is to be created. Care to check? If false we keep going with more hypotheses. – MikeJRamsey56 Dec 01 '16 at 02:14
  • @MikeJRamsey56 It's a new VPS, so space is plenty, but also added a screenshot to the post. As for permissions, I set the parent directory to 777 and I set the python script to 777. – User Dec 01 '16 at 05:17
  • @User This not `while True` i offer you `while no_data_wait_for_data`. You cant set a timer on all TCP_CONNECTION, always need waiting server + waiting prepare server answer. Information : Some VPS service using additional external gateway(dunno for what ?). Dropped a lot packet without any reason. Company told me "This is your problem, all service is OK", but i got 1.2GB log for what happening ! So on VPS more premium users can play with your data easily. Open a ticket for nothing, my offer `change your VPS location immediately` – dsgdfg Dec 01 '16 at 05:51
  • 1
    Hypothesis 3: It has something to do with the default location of the saved file. Please replace your driver.save_screenshot('none.png') with driver.get_screenshot_as_file('/tmp/none.png'). – MikeJRamsey56 Dec 01 '16 at 12:58
  • @MikeJRamsey56 Tried it, nothing. Also, if I remove the screenshot line, the same thing happens. – User Dec 01 '16 at 18:47
  • This is looking more and more like a python problem. Tonight I will install your code in one of my CentOS7 guests and see what is going on. – MikeJRamsey56 Dec 01 '16 at 19:19
  • As an aside, http_client.BadStatusLine is issued if the phantomJS driver shuts down before selenium reads the response from the RemoteDriver. Not the only reason but a reason .... – MikeJRamsey56 Dec 01 '16 at 19:29
  • Can you try changing locator type for loadMore = driver.find_element_by_link_text("Load more") with any other locator and see if it helps. – Abhinav Dec 02 '16 at 09:57
  • @Abhinav What do you mean? I don't think there's any other selector for that button. You want me to try Xpath? – User Dec 02 '16 at 17:33
  • @User Yes. please try XPath and see if that helps you out – Abhinav Dec 03 '16 at 13:41
  • @Abhinav Ok done. Same thing. `loadMore = driver.find_element_by_xpath("//a[text()='Load more']")` – User Dec 03 '16 at 16:11
  • loadMore = driver.find_element_by_xpath("//a[contains(text(),'Load more')] try this – Abhinav Dec 03 '16 at 17:25
  • Hey there! Could you check if the response from the server is indeed a HTTP response? Consider posting the response to http://chat.stackoverflow.com/rooms/129978/40799703 since this comment thread has become really long. – pradyunsg Dec 07 '16 at 08:09
  • @pradyunsg How do I do that? – User Dec 07 '16 at 08:12
  • Check out http://stackoverflow.com/a/10721643/1931274 – pradyunsg Dec 07 '16 at 08:14
  • 1
    Can you summerize the relevent points from this comment thread and also on the previously accepted answer and update your question. At the moment it's very hard to really understand what has been tried and not tried etc due to the very long comments threads – e4c5 Dec 09 '16 at 04:16
  • It looks like an issue with phantomjs or the returned page... Do you have any logs from phantomjs, or proof that you're getting a valid page from the server? – Peter Brittain Dec 09 '16 at 07:44
  • @PeterBrittain Where can I get the logs from? I can't find anything online. – User Dec 09 '16 at 21:54
  • See http://stackoverflow.com/questions/14699718/how-do-i-set-a-proxy-for-phantomjs-ghostdriver-in-python-webdriver for how to pass args to phantomjs and http://phantomjs.org/api/command-line.html for CLI options. – Peter Brittain Dec 09 '16 at 22:42

1 Answers1

2

EDIT 3

Add the following:

except httplib.BadStatusLine:
    pass

EDIT 2

Python WebDriver and phantomJs have a problem with keep_alive. This could be your problem. So add keep_alive=False as follows:

driver = webdriver.PhantomJS(desired_capabilities=dcap,keep_alive=False)

end edit


Add the following

import httplib
import socket

from selenium.webdriver.remote.command import Command

def get_status(driver):
    try:
        driver.execute(Command.STATUS)
        return "Alive"
    except (socket.error, httplib.CannotSendRequest):
        return "Dead"

Call get_status(driver) just before the save_screenshot statement and print the result. This will tell us if the driver has prematurely shutdown.

EDIT

Add the following after driver = webdriver.PhantomJS(desired_capabilities=dcap)

driver.implicitly_wait(10) #wait 10 seconds when doing a find_element before carrying on
MikeJRamsey56
  • 2,779
  • 1
  • 20
  • 34
  • Interesting. So it gives `httplib.BadStatusLine: ''` on `print get_status(driver)` – User Dec 01 '16 at 19:59
  • The driver is so dead that it can't tell us that it is dead? What if you call get_status right after driver.get("https://www.instagram.com/espn/") – MikeJRamsey56 Dec 01 '16 at 20:02
  • It says `Alive` on the first line, then the regular. – User Dec 01 '16 at 20:05
  • @User I edited the answer to make a suggestion. If that fails then what happens if in conjunction with the edited change you insert a call to get_status(driver) just before loadMore = driver.find_element_by_link_text("Load more") ? – MikeJRamsey56 Dec 01 '16 at 21:20
  • Added it after print len(find elements) and before the scroll. Output was: `Alive 12 24` then it threw the same exception on `driver.find_element_by_link_text("Load more")` – User Dec 01 '16 at 22:26
  • Also, double checking that phantomjs-2.1.1-linux-x86_64.tar.bz2 is installed on linux. – MikeJRamsey56 Dec 01 '16 at 22:35
  • `phantomjs -v` outputs `2.1.1 ` – User Dec 01 '16 at 22:36
  • Looks like phantomJs is crapping out on the call. This will have to wait until I can install a test instance. The linux phantomJs is a different install than the Mac. I have no idea what would happen if the Mac OS/X version was installed on linux; don't even know if it is possible. But the thought, momentarily, crossed my mind.What if you reinstalled phantomjs on linux? – MikeJRamsey56 Dec 01 '16 at 22:41
  • It is keep_alive. See EDIT 2. – MikeJRamsey56 Dec 02 '16 at 02:25
  • `TypeError: __init__() got an unexpected keyword argument 'keep_alive'` – User Dec 02 '16 at 02:31
  • Try keep-alive instead. – MikeJRamsey56 Dec 02 '16 at 02:39
  • Oh, maybe this should go in the headers, `dcap`? – User Dec 02 '16 at 02:40
  • Maybe. Try setting it to False in the header. If that fails, try removing it from the header. – MikeJRamsey56 Dec 02 '16 at 03:03
  • I tried `dcap["phantomjs.page.customHeaders.connection"] = "close"` and I also removed that line. Didn't work. – User Dec 02 '16 at 03:07
  • Scratch last. Hmmm. – MikeJRamsey56 Dec 02 '16 at 03:10
  • Yeah that what I meant. I removed that `customHeaders.connection` line after. – User Dec 02 '16 at 03:11
  • We know that phantomJs driver died but we don't know why. – MikeJRamsey56 Dec 02 '16 at 03:25
  • `urllib2.URLError: ` at `print len(driver.find_elements_by_css_selector("a[href*='/p/']"))` – User Dec 02 '16 at 03:55
  • Another thing I've tried: I change my IP address with a proxy to see if results were different and they weren't. – User Dec 02 '16 at 04:13
  • Not necessarily! I put `if get_status(driver) == "Alive": driver.save_screenshot('/var/www/none.png')` and still get an exception on the screenshot line. – User Dec 03 '16 at 17:57