0

I'm using Selenium in python to test a data table (not really a html table, combined by multiple divs)

That's what my table looks like:

<div class="products">
    <div class="product">
        <span class="original-price">20$</span>
        <span class="discounted-price">10$</span>            
    </div>

    <div class="product">
        <span class="price">20$</span>
    </div>

    ...
</div>

There are multiple products, some has discounted price.

This is my script:

products = self.driver.find_elements_by_css_selector('.products > div')
for product in products:
    found_price = True
    try:
        original_price = product.find_element_by_css_selector('.original-price').text
        reduced_price = product.find_element_by_css_selector('.discounted-price').text
    except NoSuchElementException:
        try:
            original_price = product.find_element_by_css_selector('.price').text
            reduced_price = original_price
        except NoSuchElementException:
            found_price = False

    if found_price: check_price(original_price, reduced_price)

But my script runs very slowly. It sends a lot of request "remote_connection" each time the "find_element_by_css_selector" called like this one:

2018-02-27 13:48:08 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:62147/session/14902b71a0f812fa74f81524f0eb1386/elements {"using": "css selector", "sessionId": "14902b71a0f812fa74f81524f0eb1386", "value": ".products > div .original-price"}
2018-02-27 13:48:08 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request

Any ideas to improve its performance ?

Thanks !

Trong Lam Phan
  • 2,292
  • 3
  • 24
  • 51
  • Can you update the question why you feel `my script runs very slowly` or some evidence on `sends a lot of request "remote_connection" each time the "find_element_by_css_selector" called`? – undetected Selenium Feb 26 '18 at 06:51
  • @DebanjanB I updated my question with the logs of selenium. Thanks – Trong Lam Phan Feb 27 '18 at 13:52
  • But because my table has one hundred rows so there are about 200 - 300 lines with "remote_connection". I wonder if it is normal or may I optimize that. – Trong Lam Phan Feb 27 '18 at 13:58

2 Answers2

0

One of the parameters of the remove WebDriver is keep_alive. As the doc says:

keep_alive - Whether to configure remote_connection.RemoteConnection to use
             HTTP keep-alive. Defaults to False.

Keeping the connection alive would improve the speed as it does not have to connect on every find request.

jordiburgos
  • 5,964
  • 4
  • 46
  • 80
  • Thank you ! Do you know how to set keep_alive = True ? This is my init but it doesn't work if I add keep_alive as an argument of chrome_options. chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('headless') chrome_options.add_argument('no-sandbox') chrome_options.add_argument('keep-alive') self.driver = webdriver.Chrome(chrome_options=chrome_options) – Trong Lam Phan Feb 25 '18 at 19:40
  • Seems like chromedriver does not support keepalive – Trong Lam Phan Feb 25 '18 at 20:18
0

It is still inconslusive why you feel your script runs very slowly. As you mentioned your code sends a lot of request remote_connection each time the find_element_by_css_selector is exactly the way as it is defined in the WebDriver-W3C Candidate Recommendation.

A small test with the Search Box of Google Home Page i.e. https://www.google.co.in with all the major variants of WebDrivers and Web Browsers reveals that :

  • Each time you search a webelement in the HTML DOM as follows :

    product.find_element_by_css_selector('.original-price')
    
  • The following request is generated :

    [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:62147/session/14902b71a0f812fa74f81524f0eb1386/elements {"using": "css selector", "sessionId": "14902b71a0f812fa74f81524f0eb1386", "value": ".products > div .original-price"}
    [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
    
  • On a successful search the following response is sent back from the Web Browser :

    webdriver::server   DEBUG   <- 200 OK {"value":{"element-6066-11e4-a52e-4f735466cecf":"6e35faa4-233f-400c-a6c7-6a66b54a69e5"}}
    

You can find a detailed discussion in Values returned by webdrivers

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352