27

I'd like to download web page generated by Javascript and store it to string variable in Python code. The page is generated when you click on button.

If I would know the resulting URL I would use urllib2 but this is not the case.

thank you

xralf
  • 3,312
  • 45
  • 129
  • 200
  • Is this generated completly in js or just built from an ajax call ? – Bite code Jan 22 '12 at 10:14
  • @e-satis I think that it's completely in js – xralf Jan 22 '12 at 10:39
  • Then I'd got with J.F solution, or with python webkit. Just keep in mind they require a display server to be running so if you plan to make it run on a headless server, you'll need to hack a little bit. – Bite code Jan 22 '12 at 11:14

1 Answers1

39

You could use Selenium Webdriver:

#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.support.ui import WebDriverWait

# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
     browser.get(url)
     button = browser.find_element_by_name('button')
     button.click()
     # wait for the page to load
     WebDriverWait(browser, timeout=10).until(
         lambda x: x.find_element_by_id('someId_that_must_be_on_new_page'))
     # store it to string variable
     page_source = browser.page_source
print(page_source)
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 3
    is the `WebDriverWait` with `someId_that_must_be_on_new_page` neccessary? Could it be done only with some `sleep` or `delay` function? And is it possible to set the user-agent string? – xralf Jan 22 '12 at 10:45
  • There is one problem yet. On the web page is `select` element and something have to be selected. If nothing is selected the button won't work. And is neccessary to open and close firefox? Without `guit` this won't work? – xralf Jan 22 '12 at 11:04
  • you could use any condition you like e.g., `x.title == 'New Title'`. You probably could modify user-agent by using appropriate firefox profile. – jfs Jan 22 '12 at 11:11
  • here's an example on how to [select option](https://gist.github.com/1411564). `.quit()` is not necessary. – jfs Jan 22 '12 at 11:15
  • The method `select_option(self, selector, value)` takes `selector` parameter. I'm not sure what this parameter should be. Let's say I want to click on option with `value = 100` of `select` with `id = 'sel_id'` and `name = 'sel_name'`. Could this be expressed in `CSS`? – xralf Jan 22 '12 at 13:26
  • @xralf: `select_option('select#sel_id', '100')`. You could pass an element instead `select_option(browser.find_element_by_id('sel_id'), '100')`. – jfs Jan 22 '12 at 13:44
  • Thanks. I already used `options = browser.find_elements_by_tag_name('option') for option in options: if option.get_attribute('value') == "100": option.click()` and worked too. – xralf Jan 22 '12 at 13:52
  • Can this done by opening Firefox window on the background? – alper Dec 27 '20 at 23:12
  • @alper yes, there headless options – jfs Sep 09 '21 at 11:07