0

My problem:

I was try crawl Google People Also Ask with selenium and my code write with python, but I have problem when internet slowly. When I click more question, it will show loader icon of Google with this HTML:

<g-loading-icon jsname="aZ2wEe" class="nhGGkb S3PB2d" style="height: 24px; width: 24px; display: none;"><img height="24" src="//www.gstatic.com/ui/v1/activityindicator/loading_24.gif" width="24" alt="Đang tải..." role="progressbar" data-atf="0" data-frt="0"></g-loading-icon>

Note this: It will show when click more result Google People Also Ask and internet slowly. When load complete g-loading-icon will hiden.

I was test and I think Xpath will change any time with structure of Google result. So I want code to wait until it loading complete to crawl not fail. Because if not waiting it load complete the code will have error: IndexError: string index out of range.

I don't want use time.sleep because I think it not best way.

Case 1: I was try catch with Xpath. But Xpath will change when structure of result Google change.

This is my code for case 1:

def click_more_gpaa(order):
    # Click button question
    # Condition for check load question
    short_timeout  = 10   # give enough time for the loading element to appear
    long_timeout = 30  # give enough time for loading to finish
    loading_element_xpath = '/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/g-loading-icon'
    loading_element_css_selector = 'nhGGkb.S3PB2d'

    try:

        # Case 1: Test with Xpath
        # wait for loading element to appear
        # - required to prevent prematurely checking if element
        #   has disappeared, before it has had a chance to appear

        is_gppa_show = WebDriverWait(driver, short_timeout).until(
            EC.presence_of_element_located((By.XPATH, loading_element_xpath))
        )

        # then wait for the element to disappear

        is_gppa_show = WebDriverWait(driver, long_timeout).until_not(
            EC.presence_of_element_located((By.XPATH, loading_element_xpath)))
        
    except TimeoutException:
        # if timeout exception was raised - it may be safe to 
        # assume loading has finished, however this may not 
        # always be the case, use with caution, otherwise handle
        # appropriately.
        pass

Case 2: I was try catch with CSS Selector. But it not work.

This is my code for case 2:

def click_more_gpaa(order):
    # Click button question
    # Condition for check load question
    short_timeout  = 10   # give enough time for the loading element to appear
    long_timeout = 30  # give enough time for loading to finish
    loading_element_xpath = '/html/body/div[7]/div/div[9]/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/g-loading-icon'
    loading_element_css_selector = 'nhGGkb.S3PB2d'

    try:

        # Case 2: Test with CSS_SELECTOR
        # wait for loading element to appear
        # - required to prevent prematurely checking if element
        #   has disappeared, before it has had a chance to appear

        is_gppa_show = WebDriverWait(driver, short_timeout).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, loading_element_xpath))
        )

        # then wait for the element to disappear

        is_gppa_show = WebDriverWait(driver, long_timeout).until_not(
            EC.presence_of_element_located((By.CSS_SELECTOR, loading_element_xpath)))        

    except TimeoutException:
        # if timeout exception was raised - it may be safe to 
        # assume loading has finished, however this may not 
        # always be the case, use with caution, otherwise handle
        # appropriately.
        pass

Have any way to do that or solution for this?

Or any document and link to read about wait in selenium and python?

I was try research but a lot of document about wait in selenium and python, I was confused about that.

Thanks you so much!

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

1 Answers1

1

To wait for the <g-loading-icon> to disappear you need to induce WebDriverWait for the invisibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using TAG_NAME:

    WebDriverWait(driver, 20).until(EC.invisibility_of_element_located((By.TAG_NAME, "g-loading-icon")))
    
  • Using CSS_SELECTOR:

    WebDriverWait(driver, 20).until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "g-loading-icon > img[role='progressbar']")))
    
  • Using XPATH:

    WebDriverWait(driver, 20).until(EC.invisibility_of_element_located((By.XPATH, "//g-loading-icon/img[@role='progressbar']")))
    
  • Note: You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352