12

I'm using selenium with Python 2.7. to retrieve the contents from a search box on a webpage. The search box dynamically retrieves and displays the results in the box itself.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import re
from time import sleep

driver = webdriver.Firefox()
driver.get(url)

df = pd.read_csv("read.csv")

def crawl(isin):
    searchkey = driver.find_element_by_name("searchkey")
    searchkey.clear()
    searchkey.send_keys(isin)
    sleep(11)

    search_result = driver.find_element_by_class_name("ac_results")
    names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
    product_id = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    return pd.Series([product_id, names])

df[["insref", "name"]] = df["ISIN"].apply(crawl)

print df

Relevant part of the code may be found under def crawl(isin):

  • The program enters what to search for in the search box (the box is badly named as searchkey).
  • It then does sleep() and waits for the content to show in the search box dropdown field ac_results.
  • Then gets two variables insrefs and names with Regex.

Instead of calling sleep(), I would like for it to wait for the content in the WebElement ac_results to load.

Since it will continuously use the search box to get new data by entering new search terms from a list, one could perhaps use Regex to identify when there is new content in ac_results that is not identical to the previous content.

Is there a method for this? It is important to note that the content in the search box is dynamically loaded, so the function would have to recognise that something has changed in the WebElement.

P A N
  • 5,642
  • 15
  • 52
  • 103

3 Answers3

23

You need to apply the Explicit Wait concept. E.g. wait for an element to become visible:

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'searchbox')))

Here, it would wait up to 10 seconds checking the visibility of the element every 500 ms.

There is a set of built-in Expected Conditions to wait for and it is also easy to write your custom Expected Condition.


FYI, here is how we approached it after brainstorming it in the chat. We've introduced a custom Expected Condition that would wait for the element text to change. It helped us to identify when the new search results appear:

import re

import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import _find_element

class text_to_change(object):
    def __init__(self, locator, text):
        self.locator = locator
        self.text = text

    def __call__(self, driver):
        actual_text = _find_element(driver, self.locator).text
        return actual_text != self.text

#Load URL
driver = webdriver.Firefox()
driver.get(url)

#Load DataFrame of terms to search for
df = pd.read_csv("searchkey.csv")

#Crawling function    
def crawl(searchkey):
    try: 
        text_before = driver.find_element_by_class_name("ac_results").text 
    except NoSuchElementException: 
        text_before = ""

    searchbox = driver.find_element_by_name("searchbox")
    searchbox.clear()
    searchbox.send_keys(searchkey)
    print "\nSearching for %s ..." % searchkey

    WebDriverWait(driver, 10).until(
        text_to_change((By.CLASS_NAME, "ac_results"), text_before)
    )

    search_result = driver.find_element_by_class_name("ac_results")
    if search_result.text != "none":
        names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
        insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    if search_result.text == "none":
        names = re.match(r"^.*(?=(\())", search_result.text)
        insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    return pd.Series([insrefs, names])

#Run crawl    
df[["Insref", "Name"]] = df["ISIN"].apply(crawl)

#Print DataFrame    
print df
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • It's not quite that easy, because the `searchbox` element will load instantly when opening the page. It's when I enter the `searchkey` into the element, it'll take up to 8-9 s until the text content in the element loads. It is that content that I would like to wait for. – P A N Jun 21 '15 at 13:24
  • 1
    @Winterflags yeah, I've just provided an example and a hint :) `text_to_be_present_in_element` is probably a good candidate in your case if you know which text to wait for. If not, then you would need a custom expected condition. – alecxe Jun 21 '15 at 13:26
  • Thank you very much, I followed your custom expected condition for regex as seen here: http://stackoverflow.com/questions/28240342/perform-a-webdriverwait-or-similar-check-on-a-regular-expression-in-python. I managed to get it waiting for the first reply that matches a pattern, but once that pattern it present it continues to run the loop searching for all searchkeys but not giving it time to conjure them. Do you have any ideas for how to make it wait for new content that matches the same pattern but is different? – P A N Jun 21 '15 at 13:56
  • The pattern looks like this: `"Name ABC123 (01234)"`, `"Something 123DEF (432134)"`, `"Somethingsomething 123 GHI (07451)"`. What is constant is that there is text followed by a series of numbers of variable length within parenthesis in the end. – P A N Jun 21 '15 at 13:58
  • @Winterflags thanks for trying it out. Could you show me the code you have so far, so that I can start with it? Thanks. – alecxe Jun 21 '15 at 14:02
  • Thank you very much for taking a look at it! I've updated the question with the full code. Unfortunately I don't have a dummy site for you to test it on, as the original is behind user/pw. – P A N Jun 21 '15 at 14:13
  • @Winterflags sure, correct me if I'm wrong: you want to wait until the text of the "searchbox" matches all of the pre-defined regex patterns?.. – alecxe Jun 21 '15 at 14:32
  • Sorry for the late reply. The text retrieved in searchbox will always follow the same pattern `names (insref)` unless it is `"none"`. E.g. `"abcdef (1234)` What will change is the text and numbers in there, and they may be of variable length and spaces. So in the first instance, as the code is doing now, I would like to wait for text to load according to the pattern. But in the second and following instances, I would like for it to wait for the content in the pattern to change. – P A N Jun 21 '15 at 14:46
  • @Winterflags I'll be afk for a while, will see what I can do later today, sorry. – alecxe Jun 21 '15 at 14:47
  • I added a written description to the question of what the code does now. Hope that clarifies a bit what we're looking for. – P A N Jun 21 '15 at 15:10
  • 1
    Solved this thanks to alecxe in chat! Super helpful. The custom expected condition above will prove useful for Selenium users waiting for dynamic text content to appear in WebElement. – P A N Jun 21 '15 at 18:16
  • @Winterflags thanks so much for the additional bounty! – alecxe Mar 25 '16 at 18:55
  • You're welcome! As a recognition of your helpfulness :) – P A N Mar 25 '16 at 19:02
1

I suggest using the below Expected Condition in WebDriverWait.

WebDriverWait(driver, 10).until(
    text_to_be_present_in_element((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)

or

WebDriverWait(driver, 10).until(
    text_to_be_present_in_element_value((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)
Manu
  • 2,251
  • 19
  • 30
  • If I'm not mistaken that will indeed wait for the very first text reply in the search box to be loaded. But if the program inserts a new searchterm right afterwards, it will recognize the pattern from the first result and not wait for the second result to load. See my explanation under "What the code does now" in OP. – P A N Jun 21 '15 at 16:09
  • WebDriverWait, that we are using is example of Explicit wait that means we need to set the wait before every element-find. That's why we use implicit wait in start that will set wait for every element-find. – Manu Jun 21 '15 at 17:01
  • I believe the best is to use sleep here or write a function to wait for JQuery calls to complete. – Manu Jun 21 '15 at 17:02
1

create class for wait condition

class SubmitChanged(object):
    def __init__(self, element):
        self.element = element

    def __call__(self, driver):
        # here we check if this is new instance of element
        new_element = driver.find_element_by_xpath('<your xpath>')
        return new_element != self.element

in your program call it

     wait = WebDriverWait(<driver object>, 3)
     wait.until(SubmitChanged(<web element>))

more info at https://selenium-python.readthedocs.io/waits.html

jay fegade
  • 83
  • 8