Using selenium through python on AWS ubuntu server to scrape dynamic pages with javascript (need to render full html). Finally got it working (thanks to unable to call firefox from selenium in python on AWS machine) for most websites, except some that will consistently give me a "Problem loading page".
In iPython:
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(1024, 768))
display.start()
driver= webdriver.Firefox()
actions = webdriver.ActionChains(driver)
The following work fine (and respond quickly):
driver.get('http://www.apple.com/')
print driver.title
>> Apple
driver.get('http://www.orange.com/')
print driver.title
>> Orange.com: Corporate Website of Orange
But the following lags for 2-3 minutes and then finally returns with a problem loading page:
driver.get('http://www.trivago.com/')
print driver.title
>> Problem loading page
Here's some more info on the attributes of the driver at that point, in case it helps:
{'_is_remote': False,
'binary': <selenium.webdriver.firefox.firefox_binary.FirefoxBinary at 0x296d590>,
'capabilities': {u'acceptSslCerts': True,
u'applicationCacheEnabled': True,
u'browserConnectionEnabled': True,
u'browserName': u'firefox',
u'cssSelectorsEnabled': True,
u'databaseEnabled': True,
u'handlesAlerts': True,
u'javascriptEnabled': True,
u'locationContextEnabled': True,
u'nativeEvents': True,
u'platform': u'Linux',
u'rotatable': False,
u'takesScreenshot': True,
u'version': u'26.0',
u'webStorageEnabled': True},
'command_executor': <selenium.webdriver.firefox.extension_connection.ExtensionConnection at 0x296d6d0>,
'error_handler': <selenium.webdriver.remote.errorhandler.ErrorHandler at 0x7f14f4d4bf50>,
'profile': <selenium.webdriver.firefox.firefox_profile.FirefoxProfile at 0x2418cd0>,
'session_id': u'ece53830-2b9d-4a32-b692-777602190d0c'}
The same urls all work well when I do the same code locally (through the terminal on my Mac).