5

I'm doing web-crawling with Selenium and I want to get an element(such as a link) written by JavaScript after Selenium simulating clicking on a fake link.

I tried get_html_source(), but it doesn't include the content written by JavaScript.

Code I've written:

    def test_comment_url_fetch(self):
        sel = self.selenium 
        sel.open("/rmrb")
        url = sel.get_location()
        #print url
        if url.startswith('http://login'):
            sel.open("/rmrb")
        i = 1
        while True:
            try:
                if i == 1:
                    sel.click("//div[@class='WB_feed_type SW_fun S_line2']/div/div/div[3]/div/a[4]") 
                    print "click"
                else:
                    XPath = "//div[@class='WB_feed_type SW_fun S_line2'][%d]/div/div/div[3]/div/a[4]"%i
                    sel.click(XPath)
                    print "click"
            except Exception, e:
                print e
                break
            i += 1
        html = sel.get_html_source()
        html_file = open("tmp\\foo.html", 'w')
        html_file.write(html.encode('utf-8'))
        html_file.close()

I use a while-loop to click a series of fake links which trigger js-actions to show extra content, and that content is what I want. But sel.get_html_source() didn't give what I want.

Anybody may help? Thanks a lot.

Friedmannn
  • 138
  • 1
  • 1
  • 9

3 Answers3

6

Since I usually do post-processing on the fetched nodes I run JavaScript directly in the browser with execute_script. For example to get all a-tags:

js_code = "return document.getElementsByTagName('a')"
your_elements = sel.execute_script(js_code)

Edit: execute_script and get_eval are equivalent except that get_eval performs an implicit return, in execute_script it has to be stated explicitly.

Michael W
  • 690
  • 1
  • 9
  • 22
  • Thanks so much. Though the correct method is `sel.get_eval(js_code)`. And I found this [Question](http://stackoverflow.com/questions/2469701/how-do-i-get-javascript-results-using-selenium?rq=1) – Friedmannn Apr 18 '13 at 07:27
1

Can't you just call the browser object inside your selenium environment? For example:

self.browser.find_elements_by_tag_name("div")

Should return you an array of divs. You can also find by class, id, and so on.

Edit Below is the code to create your 'browser' object.

from selenium import webdriver #The browser object
self.browser = webdriver.Firefox() #I Use firefox, but can do chrome, IE, and safari i believe

Then you should be able to do as shown above with the find_elements_by_tag_name.

Colby R Meier
  • 457
  • 7
  • 26
  • Sorry I didn't give the whole code of my class. 'self' here is an object of unittest.TestCase, doesn't has the attribute 'browser'. And 'sel' is an object of selenium, I tried but it doesn't has 'browser' neither. – Friedmannn Apr 18 '13 at 03:41
  • @Friedmannn I have included the code in my post to create the browser object. Just an extra 2 lines to import and define it. Enjoy. – Colby R Meier Apr 18 '13 at 15:24
  • oh I see. I'm gonna try it – Friedmannn Apr 19 '13 at 02:10
0

You would need to use a browser engine which can execute Javascript, such as PhantomJS. Javascript's changes are only visible to clients which can execute Javascript and provide a DOM/Runtime for events to be fired.

Also very close in relation to: Executing Javascript from Python

Community
  • 1
  • 1
Andrew Ty.
  • 667
  • 6
  • 12