I am new to Selenium
and web applications. Please bear with me for a second if my question seems way too obvious. Here is my story.
I have written a scraper in Python
that uses Selenium2.0 Webdriver
to crawl AJAX web pages. One of the biggest challenge (and ethics) is that I do not want to burn down the website's server. Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed.
I have done some google-searches. It seems like only selenium-RC
provides such a functionality. However, I do not want to rewrite my code just for this reason. As a compromise, I decided to limit the rate of method calls that potentially lead to the headless browser firing requests to the server.
In the script, I have the following kind of method calls:
driver.find_element_by_XXXX()
driver.execute_script()
webElement.get_attribute()
webElement.text
I use the second function to scroll to the bottom of the window and get the AJAX content, like the following:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Based on my intuition, only the second function will trigger request firing, since others seem like parsing existing html content.
Is my intuition wrong?
Many many thanks
Perhaps I should elaborate more. I am automating a process of crawling on a website in Python
. There is a subtantial amount of work done, and the script is running without large bugs.
My colleagues, however, reminded me that if in the process of crawling a page I made too many requests for the AJAX list within a short time, I may get banned by the server. This is why I started looking for a way to monitor the number of requests I am firing from my headless PhantomJS
browswer in script.
Since I cannot find a way to monitor the number of requests in script, I made the compromise I mentioned above.