1

I am new to Selenium and web applications. Please bear with me for a second if my question seems way too obvious. Here is my story.

I have written a scraper in Python that uses Selenium2.0 Webdriver to crawl AJAX web pages. One of the biggest challenge (and ethics) is that I do not want to burn down the website's server. Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed.

I have done some google-searches. It seems like only selenium-RC provides such a functionality. However, I do not want to rewrite my code just for this reason. As a compromise, I decided to limit the rate of method calls that potentially lead to the headless browser firing requests to the server.

In the script, I have the following kind of method calls:

driver.find_element_by_XXXX()
driver.execute_script()
webElement.get_attribute()
webElement.text

I use the second function to scroll to the bottom of the window and get the AJAX content, like the following:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Based on my intuition, only the second function will trigger request firing, since others seem like parsing existing html content.

Is my intuition wrong?

Many many thanks


Perhaps I should elaborate more. I am automating a process of crawling on a website in Python. There is a subtantial amount of work done, and the script is running without large bugs.

My colleagues, however, reminded me that if in the process of crawling a page I made too many requests for the AJAX list within a short time, I may get banned by the server. This is why I started looking for a way to monitor the number of requests I am firing from my headless PhantomJS browswer in script.

Since I cannot find a way to monitor the number of requests in script, I made the compromise I mentioned above.

Patrick the Cat
  • 2,138
  • 1
  • 16
  • 33

1 Answers1

1

Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed

As far as I know, the number of requests is depending on the webpage's design, i.e. the resources used by the webpage and the requests made by Javascript/AJAX. Webdriver will open a browser and load the webpage just like a normal user.

In Chrome, you can check the requests and responses using Developer Tools panel. You can refer to this post. The current UI design of Developer Tools is different but the basic functions are still the same. Alternatively, you can also use the Firebug plugin in Firefox.


Updated:

Another method to check the requests and responses is by using Wireshark. Please refer to these Wireshark filters.

Community
  • 1
  • 1
userpal
  • 1,483
  • 2
  • 22
  • 38
  • Hmm...But I am scraping a website. I cannot use Developer Tools. And I am running headless browser. I understand that number of requests depends on how long the AJAX list is. Is there a function that I can call to get the number of requests made since the first request of the page? – Patrick the Cat Jul 29 '14 at 22:29
  • My question is really: Which functions would fire request? But if you can help me with the question I have above, that would make an accepted answer too:) – Patrick the Cat Jul 29 '14 at 22:31
  • @Mai Sorry for late. Just now I was offline. I have updated my answer above. – userpal Jul 30 '14 at 13:43